Fundamentals of Error-Correcting Codes
Fundamentals of Error-Correcting Codes is an in-depth introduction to coding theory from
both an engineering and mathematical viewpoint. As well as covering classical topics, much
coverage is included of recent techniques that until now could only be found in specialist journals and book publications. Numerous exercises and examples and an accessible
writing style make this a lucid and effective introduction to coding theory for advanced
undergraduate and graduate students, researchers and engineers, whether approaching the
subject from a mathematical, engineering, or computer science background.
Professor W. Cary Huffman graduated with a PhD in mathematics from the California Institute
of Technology in 1974. He taught at Dartmouth College and Union College until he joined
the Department of Mathematics and Statistics at Loyola in 1978, serving as chair of the
department from 1986 through 1992. He is an author of approximately 40 research papers
in finite group theory, combinatorics, and coding theory, which have appeared in journals
such as the Journal of Algebra, IEEE Transactions on Information Theory, and the Journal
of Combinatorial Theory.
Professor Vera Pless was an undergraduate at the University of Chicago and received her
PhD from Northwestern in 1957. After ten years at the Air Force Cambridge Research
Laboratory, she spent a few years at MIT’s project MAC. She joined the University of
Illinois-Chicago’s Department of Mathematics, Statistics, and Computer Science as a full
professor in 1975 and has been there ever since. She is a University of Illinois Scholar and
has published over 100 papers.
Fundamentals of
Error-Correcting Codes
W. Cary Huffman
Loyola University of Chicago
and
Vera Pless
University of Illinois at Chicago
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press
The Edinburgh Building, Cambridge , United Kingdom
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521782807
© Cambridge University Press 2003
This book is in copyright. Subject to statutory exception and to the provision of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
First published in print format 2003
-
-
---- eBook (NetLibrary)
--- eBook (NetLibrary)
-
-
---- hardback
--- hardback
Cambridge University Press has no responsibility for the persistence or accuracy of
s for external or third-party internet websites referred to in this book, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
To Gayle, Kara, and Jonathan
Bill, Virginia, and Mike
Min and Mary
Thanks for all your strength and encouragement
W. C. H.
To my children Nomi, Ben, and Dan
for their support
and grandchildren Lilah, Evie, and Becky
for their love
V. P.
Contents
Preface
1
2
page xiii
Basic concepts of linear codes
1
1.1
1.2
2
Three fields
Linear codes, generator and parity check
matrices
1.3 Dual codes
1.4 Weights and distances
1.5 New codes from old
1.5.1 Puncturing codes
1.5.2 Extending codes
1.5.3 Shortening codes
1.5.4 Direct sums
1.5.5 The (u | u + v) construction
1.6 Permutation equivalent codes
1.7 More general equivalence of codes
1.8 Hamming codes
1.9 The Golay codes
1.9.1 The binary Golay codes
1.9.2 The ternary Golay codes
1.10 Reed–Muller codes
1.11 Encoding, decoding, and Shannon’s Theorem
1.11.1 Encoding
1.11.2 Decoding and Shannon’s Theorem
1.12 Sphere Packing Bound, covering radius, and
perfect codes
3
5
7
13
13
14
16
18
18
19
23
29
31
31
32
33
36
37
39
Bounds on the size of codes
53
2.1
2.2
53
58
Aq (n, d) and Bq (n, d)
The Plotkin Upper Bound
48
viii
Contents
2.3
The Johnson Upper Bounds
2.3.1 The Restricted Johnson Bound
2.3.2 The Unrestricted Johnson Bound
2.3.3 The Johnson Bound for Aq (n, d)
2.3.4 The Nordstrom–Robinson code
2.3.5 Nearly perfect binary codes
2.4 The Singleton Upper Bound and MDS codes
2.5 The Elias Upper Bound
2.6 The Linear Programming Upper Bound
2.7 The Griesmer Upper Bound
2.8 The Gilbert Lower Bound
2.9 The Varshamov Lower Bound
2.10 Asymptotic bounds
2.10.1 Asymptotic Singleton Bound
2.10.2 Asymptotic Plotkin Bound
2.10.3 Asymptotic Hamming Bound
2.10.4 Asymptotic Elias Bound
2.10.5 The MRRW Bounds
2.10.6 Asymptotic Gilbert–Varshamov Bound
2.11 Lexicodes
3
4
60
61
63
65
68
69
71
72
75
80
86
87
88
89
89
90
92
93
94
95
Finite fields
100
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
100
101
104
106
110
111
112
116
Introduction
Polynomials and the Euclidean Algorithm
Primitive elements
Constructing finite fields
Subfields
Field automorphisms
Cyclotomic cosets and minimal polynomials
Trace and subfield subcodes
Cyclic codes
4.1
4.2
4.3
4.4
4.5
4.6
4.7
Factoring x n − 1
Basic theory of cyclic codes
Idempotents and multipliers
Zeros of a cyclic code
Minimum distance of cyclic codes
Meggitt decoding of cyclic codes
Affine-invariant codes
121
122
124
132
141
151
158
162
ix
Contents
5
BCH and Reed–Solomon codes
168
5.1
5.2
5.3
5.4
168
173
175
178
179
186
190
195
200
203
204
207
5.5
5.6
6
7
BCH codes
Reed–Solomon codes
Generalized Reed–Solomon codes
Decoding BCH codes
5.4.1 The Peterson–Gorenstein–Zierler Decoding Algorithm
5.4.2 The Berlekamp–Massey Decoding Algorithm
5.4.3 The Sugiyama Decoding Algorithm
5.4.4 The Sudan–Guruswami Decoding Algorithm
Burst errors, concatenated codes, and interleaving
Coding for the compact disc
5.6.1 Encoding
5.6.2 Decoding
Duadic codes
209
6.1
6.2
6.3
6.4
6.5
6.6
209
217
220
222
229
237
238
241
245
248
Definition and basic properties
A bit of number theory
Existence of duadic codes
Orthogonality of duadic codes
Weights in duadic codes
Quadratic residue codes
6.6.1 QR codes over fields of characteristic 2
6.6.2 QR codes over fields of characteristic 3
6.6.3 Extending QR codes
6.6.4 Automorphisms of extended QR codes
Weight distributions
252
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
252
255
259
262
265
271
273
275
279
282
The MacWilliams equations
Equivalent formulations
A uniqueness result
MDS codes
Coset weight distributions
Weight distributions of punctured and shortened codes
Other weight enumerators
Constraints on weights
Weight preserving transformations
Generalized Hamming weights
x
Contents
8
Designs
291
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
291
295
298
303
308
315
321
329
330
9
10
t-designs
Intersection numbers
Complementary, derived, and residual designs
The Assmus–Mattson Theorem
Codes from symmetric 2-designs
Projective planes
Cyclic projective planes
The nonexistence of a projective plane of order 10
Hadamard matrices and designs
Self-dual codes
338
9.1
9.2
9.3
9.4
9.5
9.6
9.7
The Gleason–Pierce–Ward Theorem
Gleason polynomials
Upper bounds
The Balance Principle and the shadow
Counting self-orthogonal codes
Mass formulas
Classification
9.7.1 The Classification Algorithm
9.7.2 Gluing theory
9.8 Circulant constructions
9.9 Formally self-dual codes
9.10 Additive codes over F4
9.11 Proof of the Gleason–Pierce–Ward Theorem
9.12 Proofs of some counting formulas
338
340
344
351
359
365
366
366
370
376
378
383
389
393
Some favorite self-dual codes
397
10.1 The binary Golay codes
10.1.1 Uniqueness of the binary Golay codes
10.1.2 Properties of binary Golay codes
10.2 Permutation decoding
10.3 The hexacode
10.3.1 Uniqueness of the hexacode
10.3.2 Properties of the hexacode
10.3.3 Decoding the Golay code with the hexacode
10.4 The ternary Golay codes
397
397
401
402
405
405
406
407
413
xi
11
12
13
Contents
10.4.1 Uniqueness of the ternary Golay codes
10.4.2 Properties of ternary Golay codes
10.5 Symmetry codes
10.6 Lattices and self-dual codes
413
418
420
422
Covering radius and cosets
432
11.1
11.2
11.3
11.4
11.5
11.6
11.7
432
435
439
444
447
454
459
Basics
The Norse Bound and Reed–Muller codes
Covering radius of BCH codes
Covering radius of self-dual codes
The length function
Covering radius of subcodes
Ancestors, descendants, and orphans
Codes over Z4
467
12.1 Basic theory of Z4 -linear codes
12.2 Binary codes from Z4 -linear codes
12.3 Cyclic codes over Z4
12.3.1 Factoring x n − 1 over Z4
12.3.2 The ring Rn = Z4 [x]/(x n − 1)
12.3.3 Generating polynomials of cyclic codes over Z4
12.3.4 Generating idempotents of cyclic codes over Z4
12.4 Quadratic residue codes over Z4
12.4.1 Z4 -quadratic residue codes: p ≡ −1 (mod 8)
12.4.2 Z4 -quadratic residue codes: p ≡ 1 (mod 8)
12.4.3 Extending Z4 -quadratic residue codes
12.5 Self-dual codes over Z4
12.5.1 Mass formulas
12.5.2 Self-dual cyclic codes
12.5.3 Lattices from self-dual codes over Z4
12.6 Galois rings
12.7 Kerdock codes
12.8 Preparata codes
467
472
475
475
480
482
485
488
490
492
492
495
498
502
503
505
509
515
Codes from algebraic geometry
517
13.1 Affine space, projective space, and homogenization
13.2 Some classical codes
517
520
xii
14
15
Contents
13.2.1 Generalized Reed–Solomon codes revisited
13.2.2 Classical Goppa codes
13.2.3 Generalized Reed–Solomon codes
13.3 Algebraic curves
13.4 Algebraic geometry codes
13.5 The Gilbert–Varshamov Bound revisited
13.5.1 Goppa codes meet the Gilbert–Varshamov Bound
13.5.2 Algebraic geometry codes exceed the Gilbert–Varshamov Bound
520
521
524
526
532
541
541
543
Convolutional codes
546
14.1 Generator matrices and encoding
14.2 Viterbi decoding
14.2.1 State diagrams
14.2.2 Trellis diagrams
14.2.3 The Viterbi Algorithm
14.3 Canonical generator matrices
14.4 Free distance
14.5 Catastrophic encoders
546
551
551
554
555
558
562
568
Soft decision and iterative decoding
573
15.1
15.2
15.3
15.4
15.5
15.6
15.7
15.8
15.9
573
580
584
587
593
598
602
607
611
Additive white Gaussian noise
A Soft Decision Viterbi Algorithm
The General Viterbi Algorithm
Two-way APP decoding
Message passing decoding
Low density parity check codes
Turbo codes
Turbo decoding
Some space history
References
Symbol index
Subject index
615
630
633
Preface
Coding theory originated with the 1948 publication of the paper “A mathematical theory
of communication” by Claude Shannon. For the past half century, coding theory has grown
into a discipline intersecting mathematics and engineering with applications to almost every
area of communication such as satellite and cellular telephone transmission, compact disc
recording, and data storage.
During the 50th anniversary year of Shannon’s seminal paper, the two volume Handbook
of Coding Theory, edited by the authors of the current text, was published by Elsevier
Science. That Handbook, with contributions from 33 authors, covers a wide range of topics
at the frontiers of research. As editors of the Handbook, we felt it would be appropriate
to produce a textbook that could serve in part as a bridge to the Handbook. This textbook
is intended to be an in-depth introduction to coding theory from both a mathematical and
engineering viewpoint suitable either for the classroom or for individual study. Several of
the topics are classical, while others cover current subjects that appear only in specialized
books and journal publications. We hope that the presentation in this book, with its numerous
examples and exercises, will serve as a lucid introduction that will enable readers to pursue
some of the many themes of coding theory.
Fundamentals of Error-Correcting Codes is a largely self-contained textbook suitable
for advanced undergraduate students and graduate students at any level. A prerequisite for
this book is a course in linear algebra. A course in abstract algebra is recommended, but not
essential. This textbook could be used for at least three semesters. A wide variety of examples
illustrate both theory and computation. Over 850 exercises are interspersed at points in the
text where they are most appropriate to attempt. Most of the theory is accompanied by
detailed proofs, with some proofs left to the exercises. Because of the number of examples
and exercises that directly illustrate the theory, the instructor can easily choose either to
emphasize or deemphasize proofs.
In this preface we briefly describe the contents of the 15 chapters and give a suggested
outline for the first semester. We also propose blocks of material that can be combined in a
variety of ways to make up subsequent courses. Chapter 1 is basic with the introduction of
linear codes, generator and parity check matrices, dual codes, weight and distance, encoding
and decoding, and the Sphere Packing Bound. The Hamming codes, Golay codes, binary
Reed–Muller codes, and the hexacode are introduced. Shannon’s Theorem for the binary
symmetric channel is discussed. Chapter 1 is certainly essential for the understanding of
the remainder of the book.
Chapter 2 covers the main upper and lower bounds on the size of linear and nonlinear
codes. These include the Plotkin, Johnson, Singleton, Elias, Linear Programming, Griesmer,
xiv
Preface
Gilbert, and Varshamov Bounds. Asymptotic versions of most of these are included. MDS
codes and lexicodes are introduced.
Chapter 3 is an introduction to constructions and properties of finite fields, with a few
proofs omitted. A quick treatment of this chapter is possible if the students are familiar
with constructing finite fields, irreducible polynomials, factoring polynomials over finite
fields, and Galois theory of finite fields. Much of Chapter 3 is immediately used in the study
of cyclic codes in Chapter 4. Even with a background in finite fields, cyclotomic cosets
(Section 3.7) may be new to the student.
Chapter 4 gives the basic theory of cyclic codes. Our presentation interrelates the concepts of idempotent generator, generator polynomial, zeros of a code, and defining sets.
Multipliers are used to explore equivalence of cyclic codes. Meggitt decoding of cyclic
codes is presented as are extended cyclic and affine-invariant codes.
Chapter 5 looks at the special families of BCH and Reed–Solomon cyclic codes as well as
generalized Reed–Solomon codes. Four decoding algorithms for these codes are presented.
Burst errors and the technique of concatenation for handling burst errors are introduced
with an application of these ideas to the use of Reed–Solomon codes in the encoding and
decoding of compact disc recorders.
Continuing with the theory of cyclic codes, Chapter 6 presents the theory of duadic
codes, which include the family of quadratic residue codes. Because the complete theory of
quadratic residue codes is only slightly simpler than the theory of duadic codes, the authors
have chosen to present the more general codes and then apply the theory of these codes
to quadratic residue codes. Idempotents of binary and ternary quadratic residue codes are
explicitly computed. As a prelude to Chapter 8, projective planes are introduced as examples
of combinatorial designs held by codewords of a fixed weight in a code.
Chapter 7 expands on the concept of weight distribution defined in Chapter 1. Six equivalent forms of the MacWilliams equations, including the Pless power moments, that relate
the weight distributions of a code and its dual, are formulated. MDS codes, introduced in
Chapter 2, and coset weight distributions, introduced in Chapter 1, are revisited in more
depth. A proof of a theorem of MacWilliams on weight preserving transformations is given
in Section 7.9.
Chapter 8 delineates the basic theory of block designs particularly as they arise from
the supports of codewords of fixed weight in certain codes. The important theorem of
Assmus–Mattson is proved. The theory of projective planes in connection with codes, first
introduced in Chapter 6, is examined in depth, including a discussion of the nonexistence
of the projective plane of order 10.
Chapter 9 consolidates much of the extensive literature on self-dual codes. The Gleason–
Pierce–Ward Theorem is proved showing why binary, ternary, and quaternary self-dual
codes are the most interesting self-dual codes to study. Gleason polynomials are introduced
and applied to the determination of bounds on the minimum weight of self-dual codes.
Techniques for classifying self-dual codes are presented. Formally self-dual codes and additive codes over F4 , used in correcting errors in quantum computers, share many properties
of self-dual codes; they are introduced in this chapter.
The Golay codes and the hexacode are the subject of Chapter 10. Existence and uniqueness
of these codes are proved. The Pless symmetry codes, which generalize the ternary Golay
xv
Preface
codes, are defined and some of their properties are given. The connection between codes
and lattices is developed in the final section of the chapter.
The theory of the covering radius of a code, first introduced in Chapter 1, is the topic
of Chapter 11. The covering radii of BCH codes, Reed–Muller codes, self-dual codes, and
subcodes are examined. The length function, a basic tool in finding bounds on the covering
radius, is presented along with many of its properties.
Chapter 12 examines linear codes over the ring Z4 of integers modulo 4. The theory of
these codes is compared and contrasted with the theory of linear codes over fields. Cyclic,
quadratic residue, and self-dual linear codes over Z4 are defined and analyzed. The nonlinear
binary Kerdock and Preparata codes are presented as the Gray image of certain linear codes
over Z4 , an amazing connection that explains many of the remarkable properties of these
nonlinear codes. To study these codes, Galois rings are defined, analogously to extension
fields of the binary field.
Chapter 13 presents a brief introduction to algebraic geometry which is sufficient for a
basic understanding of algebraic geometry codes. Goppa codes, generalized Reed–Solomon
codes, and generalized Reed–Muller codes can be realized as algebraic geometry codes.
A family of algebraic geometry codes has been shown to exceed the Gilbert–Varshamov
Bound, a result that many believed was not possible.
Until Chapter 14, the codes considered were block codes where encoding depended only
upon the current message. In Chapter 14 we look at binary convolutional codes where
each codeword depends not only on the current message but on some messages in the
past as well. These codes are studied as linear codes over the infinite field of binary rational
functions. State and trellis diagrams are developed for the Viterbi Algorithm, one of the main
decoding algorithms for convolutional codes. Their generator matrices and free distance are
examined.
Chapter 15 concludes the textbook with a look at soft decision and iterative decoding.
Until this point, we had only examined hard decision decoding. We begin with a more
detailed look at communication channels, particularly those subject to additive white Gaussian noise. A soft decision Viterbi decoding algorithm is developed for convolutional codes.
Low density parity check codes and turbo codes are defined and a number of decoders for
these codes are examined. The text concludes with a brief history of the application of codes
to deep space exploration.
The following chapters and sections of this book are recommended as an introductory
one-semester course in coding theory:
r Chapter 1 (except Section 1.7),
r Sections 2.1, 2.3.4, 2.4, 2.7–2.9,
r Chapter 3 (except Section 3.8),
r Chapter 4 (except Sections 4.6 and 4.7),
r Chapter 5 (except Sections 5.4.3, 5.4.4, 5.5, and 5.6), and
r Sections 7.1–7.3.
If it is unlikely that a subsequent course in coding theory will be taught, the material in
Chapter 7 can be replaced by the last two sections of Chapter 5. This material will show
how a compact disc is encoded and decoded, presenting a nice real-world application that
students can relate to.
xvi
Preface
For subsequent semesters of coding theory, we suggest a combination of some of the
following blocks of material. With each block we have included sections that will hopefully
make the blocks self-contained under the assumption that the first course given above has
been completed. Certainly other blocks are possible. A semester can be made up of more
than one block. Later we give individual chapters or sections that stand alone and can be
used in conjunction with each other or with some of these blocks. The sections and chapters
are listed in the order they should be covered.
r Sections 1.7, 8.1–8.4, 9.1–9.7, and Chapter 10. Sections 8.1–8.4 of this block present the
essential material relating block designs to codes with particular emphasis on designs
arising from self-dual codes. The material from Chapter 9 gives an in-depth study of selfdual codes with connections to designs. Chapter 10 studies the Golay codes and hexacode
in great detail, again using designs to help in the analysis. Section 2.11 can be added to
this block as the binary Golay codes are lexicodes.
r Sections 1.7, 7.4–7.10, Chapters 8, 9, and 10, and Section 2.11. This is an extension of
the above block with more on designs from codes and codes from designs. It also looks
at weight distributions in more depth, part of which is required in Section 9.12. Codes
closely related to self-dual codes are also examined. This block may require an entire
semester.
r Sections 4.6, 5.4.3, 5.4.4, 5.5, 5.6, and Chapters 14 and 15. This block covers most of the
decoding algorithms described in the text but not studied in the first course, including both
hard and soft decision decoding. It also introduces the important classes of convolutional
and turbo codes that are used in many applications particularly in deep space communication. This would be an excellent block for engineering students or others interested in
applications.
r Sections 2.2, 2.3, 2.5, 2.6, 2.10, and Chapter 13. This block finishes the nonasymptotic
bounds not covered in the first course and presents the asymptotic versions of these bounds.
The algebraic geometry codes and Goppa codes are important for, among other reasons,
their relationship to the bounds on families of codes.
r Section 1.7 and Chapters 6 and 12. This block studies two families of codes extensively:
duadic codes, which include quadratic residue codes, and linear codes over Z4 . There
is some overlap between the two chapters to warrant studying them together. When
presenting Section 12.5.1, ideas from Section 9.6 should be discussed. Similarly it is
helpful to examine Section 10.6 before presenting Section 12.5.3.
The following mini-blocks and chapters could be used in conjunction with one another
or with the above blocks to construct a one-semester course.
r Section 1.7 and Chapter 6. Chapter 6 can stand alone after Section 1.7 is covered.
r Sections 1.7, 8.1–8.4, Chapter 10, and Section 2.11. This mini-block gives an in-depth
study of the Golay codes and hexacode with the prerequisite material on designs covered
first.
r Section 1.7 and Chapter 12. After Section 1.7 is covered, Chapter 12 can be used alone
with the exception of Sections 12.4 and 12.5. Section 12.4 can either be omitted or
supplemented with material from Section 6.6. Section 12.5 can either be skipped or
supplemented with material from Sections 9.6 and 10.6.
r Chapter 11. This chapter can stand alone.
r Chapter 14. This chapter can stand alone.
xvii
Preface
The authors would like to thank a number of people for their advice and suggestions
for this book. Philippe Gaborit tested portions of the text in its earliest form in a coding
theory course he taught at the University of Illinois at Chicago resulting in many helpful
insights. Philippe also provided some of the data used in the tables in Chapter 6. Judy
Walker’s monograph [343] on algebraic geometry codes was invaluable when we wrote
Chapter 13; Judy kindly read this chapter and offered many helpful suggestions. Ian Blake
and Frank Kschischang read and critiqued Chapters 14 and 15 providing valuable direction.
Bob McEliece provided data for some of the figures in Chapter 15. The authors also wish
to thank the staff and associates of Cambridge University Press for their valuable assistance
with production of this book. In particular we thank editorial manager Dr. Philip Meyler,
copy editor Dr. Lesley J. Thomas, and production editor Ms. Lucille Murby. Finally, the
authors would like to thank their students in coding theory courses whose questions and
comments helped refine the text. In particular Jon Lark Kim at the University of Illinois at
Chicago and Robyn Canning at Loyola University of Chicago were most helpful.
We have taken great care to read and reread the text, check the examples, and work the
exercises in an attempt to eliminate errors. As with all texts, errors are still likely to exist.
The authors welcome corrections to any that the readers find. We can be reached at our
e-mail addresses below.
W. Cary Huffman
wch@math.luc.edu
Vera Pless
pless@math.uic.edu
February 1, 2003
1
Basic concepts of linear codes
In 1948 Claude Shannon published a landmark paper “A mathematical theory of communication” [306] that signified the beginning of both information theory and coding theory.
Given a communication channel which may corrupt information sent over it, Shannon
identified a number called the capacity of the channel and proved that arbitrarily reliable
communication is possible at any rate below the channel capacity. For example, when transmitting images of planets from deep space, it is impractical to retransmit the images. Hence
if portions of the data giving the images are altered, due to noise arising in the transmission,
the data may prove useless. Shannon’s results guarantee that the data can be encoded before
transmission so that the altered data can be decoded to the specified degree of accuracy.
Examples of other communication channels include magnetic storage devices, compact
discs, and any kind of electronic communication device such as cellular telephones.
The common feature of communication channels is that information is emanating from a
source and is sent over the channel to a receiver at the other end. For instance in deep space
communication, the message source is the satellite, the channel is outer space together with
the hardware that sends and receives the data, and the receiver is the ground station on Earth.
(Of course, messages travel from Earth to the satellite as well.) For the compact disc, the
message is the voice, music, or data to be placed on the disc, the channel is the disc itself,
and the receiver is the listener. The channel is “noisy” in the sense that what is received
is not always the same as what was sent. Thus if binary data is being transmitted over the
channel, when a 0 is sent, it is hopefully received as a 0 but sometimes will be received as a
1 (or as unrecognizable). Noise in deep space communications can be caused, for example,
by thermal disturbance. Noise in a compact disc can be caused by fingerprints or scratches
on the disc. The fundamental problem in coding theory is to determine what message was
sent on the basis of what is received.
A communication channel is illustrated in Figure 1.1. At the source, a message, denoted
x in the figure, is to be sent. If no modification is made to the message and it is transmitted
directly over the channel, any noise would distort the message so that it is not recoverable.
The basic idea is to embellish the message by adding some redundancy to it so that hopefully
the received message is the original message that was sent. The redundancy is added by the
encoder and the embellished message, called a codeword c in the figure, is sent over the
channel where noise in the form of an error vector e distorts the codeword producing a
received vector y.1 The received vector is then sent to be decoded where the errors are
1
Generally our codeword symbols will come from a field Fq , with q elements, and our messages and codewords
will be vectors in vector spaces Fqk and Fqn , respectively; if c entered the channel and y exited the channel, the
difference y − c is what we have termed the error e in Figure 1.1.
2
Basic concepts of linear codes
Message
source
✲ Encoder
x = x1 · · · xk
message
✲ Channel
c = c1 · · · cn
codeword
✲ Decoder
y=c+e
received
vector
✲ Receiver
x
estimate
of message
e = e1 · · · en
error from
noise
Figure 1.1 Communication channel.
removed, the redundancy is then stripped off, and an estimate
x of the original message
is produced. Hopefully
x = x. (There is a one-to-one correspondence between codewords
and messages. Thus we will often take the point of view that the job of the decoder is to
obtain an estimate
y of y and hope that
y = c.) Shannon’s Theorem guarantees that our
hopes will be fulfilled a certain percentage of the time. With the right encoding based on the
characteristics of the channel, this percentage can be made as high as we desire, although
not 100%.
The proof of Shannon’s Theorem is probabilistic and nonconstructive. In other words, no
specific codes were produced in the proof that give the desired accuracy for a given channel.
Shannon’s Theorem only guarantees their existence. The goal of research in coding theory is
to produce codes that fulfill the conditions of Shannon’s Theorem. In the pages that follow,
we will present many codes that have been developed since the publication of Shannon’s
work. We will describe the properties of these codes and on occasion connect these codes to
other branches of mathematics. Once the code is chosen for application, encoding is usually
rather straightforward. On the other hand, decoding efficiently can be a much more difficult
task; at various points in this book we will examine techniques for decoding the codes we
construct.
1.1
Three fields
Among all types of codes, linear codes are studied the most. Because of their algebraic
structure, they are easier to describe, encode, and decode than nonlinear codes. The code
alphabet for linear codes is a finite field, although sometimes other algebraic structures
(such as the integers modulo 4) can be used to define codes that are also called “linear.”
In this chapter we will study linear codes whose alphabet is a field Fq , also denoted
GF(q), with q elements. In Chapter 3, we will give the structure and properties of finite
fields. Although we will present our general results over arbitrary fields, we will often
specialize to fields with two, three, or four elements.
A field is an algebraic structure consisting of a set together with two operations, usually called addition (denoted by +) and multiplication (denoted by · but often omitted),
which satisfy certain axioms. Three of the fields that are very common in the study
3
1.2 Linear codes, generator and parity check matrices
of linear codes are the binary field with two elements, the ternary field with three elements, and the quaternary field with four elements. One can work with these fields
by knowing their addition and multiplication tables, which we present in the next three
examples.
Example 1.1.1 The binary field F2 with two elements {0, 1} has the following addition and
multiplication tables:
+
0
1
0
0
1
·
0
1
1
1
0
0
0
0
1
0
1
This is also the ring of integers modulo 2.
Example 1.1.2 The ternary field F3 with three elements {0, 1, 2} has addition and multiplication tables given by addition and multiplication modulo 3:
+
0
1
2
0
0
1
2
1
1
2
0
·
0
1
2
2
2
0
1
0
0
0
0
1
0
1
2
2
0
2
1
Example 1.1.3 The quaternary field F4 with four elements {0, 1, ω, ω} is more complicated. It has the following addition and multiplication tables; F4 is not the ring of integers
modulo 4:
+
0
1
ω
ω
0
0
1
ω
ω
1
1
0
ω
ω
ω
ω
ω
0
1
ω
ω
ω
1
0
·
0
1
ω
ω
0 1
0 0
0 1
0 ω
0 ω
ω
0
ω
ω
1
ω
0
ω
1
ω
Some fundamental equations are observed in these tables. For instance, one notices that
x + x = 0 for all x ∈ F4 . Also ω = ω2 = 1 + ω and ω3 = ω3 = 1.
1.2
Linear codes, generator and parity check matrices
Let Fqn denote the vector space of all n-tuples over the finite field Fq . An (n, M) code C
over Fq is a subset of Fqn of size M. We usually write the vectors (a1 , a2 , . . . , an ) in Fqn in the
form a1 a2 · · · an and call the vectors in C codewords. Codewords are sometimes specified
in other ways. The classic example is the polynomial representation used for codewords in
cyclic codes; this will be described in Chapter 4. The field F2 of Example 1.1.1 has had
a very special place in the history of coding theory, and codes over F2 are called binary
codes. Similarly codes over F3 are termed ternary codes, while codes over F4 are called
quaternary codes. The term “quaternary” has also been used to refer to codes over the ring
Z4 of integers modulo 4; see Chapter 12.
4
Basic concepts of linear codes
Without imposing further structure on a code its usefulness is somewhat limited. The most
useful additional structure to impose is that of linearity. To that end, if C is a k-dimensional
subspace of Fqn , then C will be called an [n, k] linear code over Fq . The linear code C
has q k codewords. The two most common ways to present a linear code are with either a
generator matrix or a parity check matrix. A generator matrix for an [n, k] code C is any
k × n matrix G whose rows form a basis for C. In general there are many generator matrices
for a code. For any set of k independent columns of a generator matrix G, the corresponding
set of coordinates forms an information set for C. The remaining r = n − k coordinates are
termed a redundancy set and r is called the redundancy of C. If the first k coordinates form
an information set, the code has a unique generator matrix of the form [Ik | A] where Ik is
the k × k identity matrix. Such a generator matrix is in standard form. Because a linear code
is a subspace of a vector space, it is the kernel of some linear transformation. In particular,
there is an (n − k) × n matrix H , called a parity check matrix for the [n, k] code C, defined
by
C = x ∈ Fqn H xT = 0 .
(1.1)
Note that the rows of H will also be independent. In general, there are also several possible
parity check matrices for C. The next theorem gives one of them when C has a generator
matrix in standard form. In this theorem AT is the transpose of A.
Theorem 1.2.1 If G = [Ik | A] is a generator matrix for the [n, k] code C in standard form,
then H = [−AT | In−k ] is a parity check matrix for C.
Proof: We clearly have H G T = −AT + AT = O. Thus C is contained in the kernel of the
linear transformation x → H xT . As H has rank n − k, this linear transformation has kernel
of dimension k, which is also the dimension of C. The result follows.
Exercise 1 Prior to the statement of Theorem 1.2.1, it was noted that the rows of the
(n − k) × n parity check matrix H satisfying (1.1) are independent. Why is that so? Hint:
The map x → H xT is a linear transformation from Fqn to Fqn−k with kernel C. From linear
algebra, what is the rank of H ?
Example 1.2.2 The simplest way to encode information in order to recover it in the presence
of noise is to repeat each message symbol a fixed number of times. Suppose that our
information is binary with symbols from the field F2 , and we repeat each symbol n times. If
for instance n = 7, then whenever we want to send a 0 we send 0000000, and whenever we
want to send a 1 we send 1111111. If at most three errors are made in transmission and if
we decode by “majority vote,” then we can correctly determine the information symbol, 0
or 1. In general, our code C is the [n, 1] binary linear code consisting of the two codewords
0 = 00 · · · 0 and 1 = 11 · · · 1 and is called the binary repetition code of length n. This code
can correct up to e = ⌊(n − 1)/2⌋ errors: if at most e errors are made in a received vector,
then the majority of coordinates will be correct, and hence the original sent codeword can
be recovered. If more than e errors are made, these errors cannot be corrected. However,
this code can detect n − 1 errors, as received vectors with between 1 and n − 1 errors will
5
1.3 Dual codes
definitely not be codewords. A generator matrix for the repetition code is
G = [1 | 1 · · · 1],
which is of course in standard form. The corresponding parity check matrix from
Theorem 1.2.1 is
1
1
H = . In−1 .
.
.
1
The first coordinate is an information set and the last n − 1 coordinates form a redundancy
set.
Exercise 2 How many information sets are there for the [n, 1] repetition code of
Example 1.2.2?
Example 1.2.3 The matrix G = [I4 | A], where
1 0 0 0 0 1 1
0 1 0 0 1 0 1
G=
0 0 1 0 1 1 0
0
0
0
1
1
1
1
is a generator matrix in standard form for a [7, 4] binary code that we denote by H3 . By
Theorem 1.2.1 a parity check matrix for H3 is
0 1 1 1 1 0 0
H = [AT | I3 ] = 1 0 1 1 0 1 0 .
1 1 0 1 0 0 1
This code is called the [7, 4] Hamming code.
Exercise 3 Find at least four information sets in the [7, 4] code H3 from Example 1.2.3.
Find at least one set of four coordinates that do not form an information set.
Often in this text we will refer to a subcode of a code C. If C is not linear (or not known
to be linear), a subcode of C is any subset of C. If C is linear, a subcode will be a subset of
C which must also be linear; in this case a subcode of C is a subspace of C.
1.3
Dual codes
The generator matrix G of an [n, k] code C is simply a matrix whose rows are independent
and span the code. The rows of the parity check matrix H are independent; hence H is the
generator matrix of some code, called the dual or orthogonal of C and denoted C ⊥ . Notice
that C ⊥ is an [n, n − k] code. An alternate way to define the dual code is by using inner
products.
6
Basic concepts of linear codes
Recall that the ordinary inner product of vectors x = x1 · · · xn , y = y1 · · · yn in Fqn is
n
x·y=
xi yi .
i=1
Therefore from (1.1), we see that C ⊥ can also be defined by
C ⊥ = x ∈ Fqn x · c = 0 for all c ∈ C .
(1.2)
It is a simple exercise to show that if G and H are generator and parity check matrices, respectively, for C, then H and G are generator and parity check matrices, respectively, for C ⊥ .
Exercise 4 Prove that if G and H are generator and parity check matrices, respectively,
for C, then H and G are generator and parity check matrices, respectively, for C ⊥ .
Example 1.3.1 Generator and parity check matrices for the [n, 1] repetition code C are
given in Example 1.2.2. The dual code C ⊥ is the [n, n − 1] code with generator matrix
H and thus consists of all binary n-tuples a1 a2 · · · an−1 b, where b = a1 + a2 + · · · + an−1
(addition in F2 ). The nth coordinate b is an overall parity check for the first n − 1 coordinates
chosen, therefore, so that the sum of all the coordinates equals 0. This makes it easy to see
that G is indeed a parity check matrix for C ⊥ . The code C ⊥ has the property that a single
transmission error can be detected (since the sum of the coordinates will not be 0) but not
corrected (since changing any one of the received coordinates will give a vector whose sum
of coordinates will be 0).
A code C is self-orthogonal provided C ⊆ C ⊥ and self-dual provided C = C ⊥ . The length
n of a self-dual code is even and the dimension is n/2.
Exercise 5 Prove that a self-dual code has even length n and dimension n/2.
Example 1.3.2 One generator matrix for the [7, 4] Hamming code H3 is presented in
3 be the code of length 8 and dimension 4 obtained from H3 by
Example 1.2.3. Let H
adding an overall parity check coordinate to each vector of G and thus to each codeword
of H3 . Then
1 0 0 0 0 1 1 1
= 0 1 0 0 1 0 1 1
G
0 0 1 0 1 1 0 1
0 0 0 1 1 1 1 0
3 . It is easy to verify that H
3 is a self-dual code.
is a generator matrix for H
Example 1.3.3 The [4, 2] ternary code H3,2 , often called the tetracode, has generator matrix
G, in standard form, given by
G=
1 0
0 1
1
1
1
.
−1
This code is also self-dual.
7
1.4 Weights and distances
3 from Example 1.3.2 and H3,2 from Example 1.3.3 are self-dual
Exercise 6 Prove that H
codes.
Exercise 7 Find all the information sets of the tetracode given in Example 1.3.3.
When studying quaternary codes over the field F4 (Example 1.1.3), it is often useful to
consider another inner product, called the Hermitian inner product, given by
n
x, y = x · y =
xi yi ,
i=1
where , called conjugation, is given by 0 = 0, 1 = 1, and ω = ω. Using this inner product,
we can define the Hermitian dual of a quaternary code C to be, analogous to (1.2),
C ⊥ H = x ∈ Fqn x, c = 0 for all c ∈ C .
Define the conjugate of C to be
C = {c | c ∈ C},
⊥
where c = c1 c2 · · · cn when c = c1 c2 · · · cn . Notice that C ⊥ H = C . We also have Hermitian
self-orthogonality and Hermitian self-duality: namely, C is Hermitian self-orthogonal if
C ⊆ C ⊥ H and Hermitian self-dual if C = C ⊥ H .
⊥
Exercise 8 Prove that if C is a code over F4 , then C ⊥ H = C .
Example 1.3.4 The [6, 3]
given by
1 0 0 1 ω
G 6 = 0 1 0 ω 1
0 0 1 ω ω
quaternary code G 6 has generator matrix G 6 in standard form
ω
ω .
1
This code is often called the hexacode. It is Hermitian self-dual.
Exercise 9 Verify the following properties of the Hermitian inner product on Fn4 :
(a) x, x ∈ {0, 1} for all x ∈ Fn4 .
(b) x, y + z = x, y + x, z for all x, y, z ∈ Fn4 .
(c) x + y, z = x, z + y, z for all x, y, z ∈ Fn4 .
(d) x, y = y, x for all x, y ∈ Fn4 .
(e) αx, y = αx, y for all x, y ∈ Fn4 .
(f) x, αy = αx, y for all x, y ∈ Fn4 .
Exercise 10 Prove that the hexacode G 6 from Example 1.3.4 is Hermitian self-dual.
1.4
Weights and distances
An important invariant of a code is the minimum distance between codewords. The
(Hamming) distance d(x, y) between two vectors x, y ∈ Fqn is defined to be the number
8
Basic concepts of linear codes
of coordinates in which x and y differ. The proofs of the following properties of distance
are left as an exercise.
Theorem 1.4.1 The distance function d(x, y) satisfies the following four properties:
(i) (non-negativity) d(x, y) ≥ 0 for all x, y ∈ Fqn .
(ii) d(x, y) = 0 if and only if x = y.
(iii) (symmetry) d(x, y) = d(y, x) for all x, y ∈ Fqn .
(iv) (triangle inequality) d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z ∈ Fqn .
This theorem makes the distance function a metric on the vector space Fqn .
Exercise 11 Prove Theorem 1.4.1.
The (minimum) distance of a code C is the smallest distance between distinct codewords
and is important in determining the error-correcting capability of C; as we see later, the higher
the minimum distance, the more errors the code can correct. The (Hamming) weight wt(x)
of a vector x ∈ Fqn is the number of nonzero coordinates in x. The proof of the following
relationship between distance and weight is also left as an exercise.
Theorem 1.4.2 If x, y ∈ Fqn , then d(x, y) = wt(x − y). If C is a linear code, the minimum
distance d is the same as the minimum weight of the nonzero codewords of C.
As a result of this theorem, for linear codes, the minimum distance is also called the minimum
weight of the code. If the minimum weight d of an [n, k] code is known, then we refer to
the code as an [n, k, d] code.
Exercise 12 Prove Theorem 1.4.2.
When dealing with codes over F2 , F3 , or F4 , there are some elementary results about
codeword weights that prove to be useful. We collect them here and leave the proof to the
reader.
Theorem 1.4.3 The following hold:
(i) If x, y ∈ Fn2 , then
wt(x + y) = wt(x) + wt(y) − 2wt(x ∩ y),
(ii)
(iii)
(iv)
(v)
where x ∩ y is the vector in Fn2 , which has 1s precisely in those positions where both
x and y have 1s.
If x, y ∈ Fn2 , then wt(x ∩ y) ≡ x · y (mod 2).
If x ∈ Fn2 , then wt(x) ≡ x · x (mod 2).
If x ∈ Fn3 , then wt(x) ≡ x · x (mod 3).
If x ∈ Fn4 , then wt(x) ≡ x,x (mod 2).
Exercise 13 Prove Theorem 1.4.3.
Let Ai , also denoted Ai (C), be the number of codewords of weight i in C. The list Ai for
0 ≤ i ≤ n is called the weight distribution or weight spectrum of C. A great deal of research
9
1.4 Weights and distances
is devoted to the computation of the weight distribution of specific codes or families of
codes.
Example 1.4.4 Let C be the binary code with generator matrix
1 1 0 0 0 0
G = 0 0 1 1 0 0 .
0 0 0 0 1 1
The weight distribution of C is A0 = A6 = 1 and A2 = A4 = 3. Notice that only the nonzero
Ai are usually listed.
Exercise 14 Find the weight distribution of the ternary code with generator matrix
1 1 0 0 0 0
G = 0 0 1 1 0 0 .
0 0 0 0 1 1
Compare your result to Example 1.4.4.
Certain elementary facts about the weight distribution are gathered in the following
theorem. Deeper results on the weight distribution of codes will be presented in Chapter 7.
Theorem 1.4.5 Let C be an [n, k, d] code over Fq . Then:
(i) A0 (C) + A1 (C) + · · · + An (C) = q k .
(ii) A0 (C) = 1 and A1 (C) = A2 (C) = · · · = Ad−1 (C) = 0.
(iii) If C is a binary code containing the codeword 1 = 11 · · · 1, then Ai (C) = An−i (C) for
0 ≤ i ≤ n.
(iv) If C is a binary self-orthogonal code, then each codeword has even weight, and C ⊥
contains the codeword 1 = 11 · · · 1.
(v) If C is a ternary self-orthogonal code, then the weight of each codeword is divisible by
three.
(vi) If C is a quaternary Hermitian self-orthogonal code, then the weight of each codeword
is even.
Exercise 15 Prove Theorem 1.4.5.
Theorem 1.4.5(iv) states that all codewords in a binary self-orthogonal code C have even
weight. If we look at the subset of codewords of C that have weights divisible by four, we
surprisingly get a subcode of C; that is, the subset of codewords of weights divisible by four
form a subspace of C. This is not necessarily the case for non-self-orthogonal codes.
Theorem 1.4.6 Let C be an [n, k] self-orthogonal binary code. Let C 0 be the set of codewords in C whose weights are divisible by four. Then either:
(i) C = C 0 , or
(ii) C 0 is an [n, k − 1] subcode of C and C = C 0 ∪ C 1 , where C 1 = x + C 0 for any codeword
x whose weight is even but not divisible by four. Furthermore C 1 consists of all
codewords of C whose weights are not divisible by four.
10
Basic concepts of linear codes
Proof: By Theorem 1.4.5(iv) all codewords have even weight. Therefore either (i) holds
or there exists a codeword x of even weight but not of weight a multiple of four. Assume
the latter. Let y be another codeword whose weight is even but not a multiple of four.
Then by Theorem 1.4.3(i), wt(x + y) = wt(x) + wt(y) − 2wt(x ∩ y) ≡ 2 + 2 − 2wt(x ∩ y)
(mod 4). But by Theorem 1.4.3(ii), wt(x ∩ y) ≡ x · y (mod 2). Hence wt(x + y) is divisible
by four. Therefore x + y ∈ C 0 . This shows that y ∈ x + C 0 and C = C 0 ∪ (x + C 0 ). That C 0
is a subcode of C and that C 1 = x + C 0 consists of all codewords of C whose weights are
not divisible by four follow from a similar argument.
There is an analogous result to Theorem 1.4.6 where you consider the subset of codewords
of a binary code whose weights are even. In this case the self-orthogonality requirement is
unnecessary; we leave its proof to the exercises.
Theorem 1.4.7 Let C be an [n, k] binary code. Let C e be the set of codewords in C whose
weights are even. Then either:
(i) C = C e , or
(ii) C e is an [n, k − 1] subcode of C and C = C e ∪ C o , where C o = x + C e for any codeword
x whose weight is odd. Furthermore C o consists of all codewords of C whose weights
are odd.
Exercise 16 Prove Theorem 1.4.7.
Exercise 17 Let C be the [6, 3] binary code with generator matrix
1 1 0 0 0 0
G = 0 1 1 0 0 0 .
1 1 1 1 1 1
(a) Prove that C is not self-orthogonal.
(b) Find the weight distribution of C.
(c) Show that the codewords whose weights are divisible by four do not form a subcode
of C.
The next result gives a way to tell when Theorem 1.4.6(i) is satisfied.
Theorem 1.4.8 Let C be a binary linear code.
(i) If C is self-orthogonal and has a generator matrix each of whose rows has weight
divisible by four, then every codeword of C has weight divisible by four.
(ii) If every codeword of C has weight divisible by four, then C is self-orthogonal.
Proof: For (i), let x and y be rows of the generator matrix. By Theorem 1.4.3(i), wt(x + y) =
wt(x) + wt(y) − 2wt(x ∩ y) ≡ 0 + 0 − 2wt(x ∩ y) ≡ 0 (mod 4). Now proceed by induction as every codeword is a sum of rows of the generator matrix. For (ii), let x, y ∈ C. By
Theorem 1.4.3(i) and (ii), 2(x · y) ≡ 2wt(x ∩ y) ≡ 2wt(x ∩ y) − wt(x) − wt(y) ≡ −wt(x +
y) ≡ 0 (mod 4). Thus x · y ≡ 0 (mod 2).
It is natural to ask if Theorem 1.4.8(ii) can be generalized to codes whose codewords
have weights that are divisible by numbers other than four. We say that a code C (over
11
1.4 Weights and distances
any field) is divisible provided all codewords have weights divisible by an integer > 1.
The code is said to be divisible by ; is called a divisor of C, and the largest such
divisor is called the divisor of C. Thus Theorem 1.4.8(ii) says that binary codes divisible
by = 4 are self-orthogonal. This is not true when considering binary codes divisible
by = 2, as the next example illustrates. Binary codes divisible by = 2 are called
even.
Example 1.4.9 The dual of the [n, 1] binary repetition code C of Example 1.2.2 consists
of all the even weight vectors of length n. (See also Example 1.3.1.) If n > 2, this code is
not self-orthogonal.
When considering codes over F3 and F4 , the divisible codes with divisors three and
two, respectively, are self-orthogonal as the next theorem shows. This theorem includes the
converse of Theorem 1.4.5(v) and (vi). Part (ii) is found in [217].
Theorem 1.4.10 Let C be a code over Fq , with q = 3 or 4.
(i) When q = 3, every codeword of C has weight divisible by three if and only if C is
self-orthogonal.
(ii) When q = 4, every codeword of C has weight divisible by two if and only if C is
Hermitian self-orthogonal.
Proof: In (i), if C is self-orthogonal, the codewords have weights divisible by three by
Theorem 1.4.5(v). For the converse let x, y ∈ C. We need to show that x · y = 0. We can
view the codewords x and y having the following parameters:
x:
y:
⋆ 0 = =
0 ⋆ = =
a b c d
0
0
e
where there are a coordinates where x is nonzero and y is zero, b coordinates where y is
nonzero and x is zero, c coordinates where both agree and are nonzero, d coordinates when
both disagree and are nonzero, and e coordinates where both are zero. So wt(x + y) = a +
b + c and wt(x − y) = a + b + d. But x ± y ∈ C and hence a + b + c ≡ a + b + d ≡ 0
(mod 3). In particular c ≡ d (mod 3). Therefore x · y = c + 2d ≡ 0 (mod 3), proving (i).
In (ii), if C is Hermitian self-orthogonal, the codewords have even weights by Theorem 1.4.5(vi). For the converse let x ∈ C. If x has a 0s, b 1s, c ωs, and d ωs, then b + c + d
is even as wt(x) = b + c + d. However, x, x also equals b + c + d (as an element of F4 ).
Therefore x, x = 0 for all x ∈ C. Now let x, y ∈ C. So both x + y and ωx + y are in C. Using
Exercise 9 we have 0 = x + y, x + y = x, x + x, y + y, x + y, y = x, y +
y, x. Also 0 = ωx + y, ωx + y = x, x + ωx, y + ωy, x + y, y = ωx, y +
ωy, x. Combining these x, y must be 0, proving (ii).
The converse of Theorem 1.4.5(iv) is in general not true. The best that can be said in this
case is contained in the following theorem, whose proof we leave as an exercise.
Theorem 1.4.11 Let C be a binary code with a generator matrix each of whose rows has
even weight. Then every codeword of C has even weight.
12
Basic concepts of linear codes
Exercise 18 Prove Theorem 1.4.11.
Binary codes for which all codewords have weight divisible by four are called doublyeven.2 By Theorem 1.4.8, doubly-even codes are self-orthogonal. A self-orthogonal code
must be even by Theorem 1.4.5(iv); one which is not doubly-even is called singly-even.
Exercise 19 Find the minimum weights and weight distributions of the codes H3 in
Example 1.2.3, H⊥
3 , H3 in Example 1.3.2, the tetracode in Example 1.3.3, and the hexacode
in Example 1.3.4. Which of the binary codes listed are self-orthogonal? Which are doublyeven? Which are singly-even?
There is a generalization of the concepts of even and odd weight binary vectors to
vectors over arbitrary fields, which is useful in the study of many types of codes. A vector
x = x1 x2 · · · xn in Fqn is even-like provided that
n
i=1
xi = 0
and is odd-like otherwise. A binary vector is even-like if and only if it has even weight; so
the concept of even-like vectors is indeed a generalization of even weight binary vectors.
The even-like vectors in a code form a subcode of a code over Fq as did the even weight
vectors in a binary code. Except in the binary case, even-like vectors need not have even
weight. The vectors (1, 1, 1) in F33 and (1, ω, ω) in F34 are examples. We say that a code is
even-like if it has only even-like codewords; a code is odd-like if it is not even-like.
Theorem 1.4.12 Let C be an [n, k] code over Fq . Let C e be the set of even-like codewords
in C. Then either:
(i) C = C e , or
(ii) C e is an [n, k − 1] subcode of C.
Exercise 20 Prove Theorem 1.4.12.
There is an elementary relationship between the weight of a codeword and a parity check
matrix for a linear code. This is presented in the following theorem whose proof is left as
an exercise.
Theorem 1.4.13 Let C be a linear code with parity check matrix H . If c ∈ C, the columns
of H corresponding to the nonzero coordinates of c are linearly dependent. Conversely,
if a linear dependence relation with nonzero coefficients exists among w columns of H ,
then there is a codeword in C of weight w whose nonzero coordinates correspond to these
columns.
One way to find the minimum weight d of a linear code is to examine all the nonzero
codewords. The following corollary shows how to use the parity check matrix to find d.
2
Some authors reserve the term “doubly-even” for self-dual codes for which all codewords have weight divisible
by four.
13
1.5 New codes from old
Corollary 1.4.14 A linear code has minimum weight d if and only if its parity check matrix
has a set of d linearly dependent columns but no set of d − 1 linearly dependent columns.
Exercise 21 Prove Theorem 1.4.13 and Corollary 1.4.14.
The minimum weight is also characterized in the following theorem.
Theorem 1.4.15 If C is an [n, k, d] code, then every n − d + 1 coordinate position contains
an information set. Furthermore, d is the largest number with this property.
Proof: Let G be a generator matrix for C, and consider any set X of s coordinate positions.
To make the argument easier, we assume X is the set of the last s positions. (After we
develop the notion of equivalent codes, the reader will see that this argument is in fact
general.) Suppose X does not contain an information set. Let G = [A | B], where A is
k × (n − s) and B is k × s. Then the column rank of B, and hence the row rank of B,
is less than k. Hence there exists a nontrivial linear combination of the rows of B which
equals 0, and hence a codeword c which is 0 in the last s positions. Since the rows of G are
linearly independent, c = 0 and hence d ≤ n − s, equivalently, s ≤ n − d. The theorem
now follows.
Exercise 22 Find the number of information sets for the [7, 4] Hamming code H3
3 from Example
given in Example 1.2.3. Do the same for the extended Hamming code H
1.3.2.
1.5
New codes from old
As we will see throughout this book, many interesting and important codes will arise by
modifying or combining existing codes. We will discuss five ways to do this.
1.5.1
Puncturing codes
Let C be an [n, k, d] code over Fq . We can puncture C by deleting the same coordinate i
in each codeword. The resulting code is still linear, a fact that we leave as an exercise; its
length is n − 1, and we often denote the punctured code by C ∗ . If G is a generator matrix for
C, then a generator matrix for C ∗ is obtained from G by deleting column i (and omitting a
zero or duplicate row that may occur). What are the dimension and minimum weight of C ∗ ?
Because C contains q k codewords, the only way that C ∗ could contain fewer codewords is if
two codewords of C agree in all but coordinate i. In that case C has minimum distance d = 1
and a codeword of weight 1 whose nonzero entry is in coordinate i. The minimum distance
decreases by 1 only if a minimum weight codeword of C has a nonzero ith coordinate.
Summarizing, we have the following theorem.
Theorem 1.5.1 Let C be an [n, k, d] code over Fq , and let C ∗ be the code C punctured on
the ith coordinate.
14
Basic concepts of linear codes
(i) If d > 1, C ∗ is an [n − 1, k, d ∗ ] code where d ∗ = d − 1 if C has a minimum weight
codeword with a nonzero ith coordinate and d ∗ = d otherwise.
(ii) When d = 1, C ∗ is an [n − 1, k, 1] code if C has no codeword of weight 1 whose
nonzero entry is in coordinate i; otherwise, if k > 1, C ∗ is an [n − 1, k − 1, d ∗ ] code
with d ∗ ≥ 1.
Exercise 23 Prove directly from the definition that a punctured linear code is also
linear.
Example 1.5.2 Let C be the [5, 2, 2] binary code with generator matrix
G=
1
0
1
0
0
1
0
1
0
.
1
Let C ∗1 and C ∗5 be the code C punctured on coordinates 1 and 5, respectively. They have
generator matrices
G ∗1 =
1
0
0
1
0
1
0
1
and
G ∗5 =
1
0
1
0
0
1
0
.
1
So C ∗1 is a [4, 2, 1] code, while C ∗5 is a [4, 2, 2] code.
Example 1.5.3 Let D be the [4, 2, 1] binary code with generator matrix
G=
1
0
0
1
0
1
0
.
1
Let D∗1 and D∗4 be the code D punctured on coordinates 1 and 4, respectively. They have
generator matrices
D1∗ = [1
1
1]
and
D4∗ =
1
0
0
1
0
.
1
So D∗1 is a [3, 1, 3] code and D∗4 is a [3, 2, 1] code.
Notice that the code D of Example 1.5.3 is the code C ∗1 of Example 1.5.2. Obviously D∗4
could have been obtained from C directly by puncturing on coordinates {1, 5}. In general a
code C can be punctured on the coordinate set T by deleting components indexed by the set
T in all codewords of C. If T has size t, the resulting code, which we will often denote C T ,
is an [n − t, k ∗ , d ∗ ] code with k ∗ ≥ k − t and d ∗ ≥ d − t by Theorem 1.5.1 and induction.
1.5.2
Extending codes
We can create longer codes by adding a coordinate. There are many possible ways to extend
a code but the most common is to choose the extension so that the new code has only
even-like vectors (as defined in Section 1.4). If C is an [n, k, d] code over Fq , define the
extended code
C to be the code
C = x1 x2 · · · xn+1 ∈ Fqn+1 x1 x2 · · · xn ∈ C with x1 + x2 + · · · + xn+1 = 0 .
15
1.5 New codes from old
We leave it as an exercise to show that
C is linear. In fact
C is an [n + 1, k,
d] code, where
d = d or d + 1. Let G and H be generator and parity check matrices, respectively, for C.
for
Then a generator matrix G
C can be obtained from G by adding an extra column to G
is 0. A parity check matrix for
so that the sum of the coordinates of each row of G
C is the
matrix
1 ··· 1 1
0
=
(1.3)
H
.. .
.
H
0
This construction is also referred to as adding an overall parity check. The [8, 4, 4] binary
3 in Example 1.3.2 obtained from the [7, 4, 3] Hamming code H3 by adding an
code H
overall parity check is called the extended Hamming code.
Exercise 24 Prove directly from the definition that an extended linear code is also
linear.
C
Exercise 25 Suppose we extend the [n, k] linear code C over the field Fq to the code
where
2
C = x1 x2 · · · xn+1 ∈ Fqn+1 x1 x2 · · · xn ∈ C with x12 + x22 + · · · + xn+1
=0 .
Under what conditions is
C linear?
in (1.3) is the parity check matrix for an extended code
Exercise 26 Prove that H
C, where
C has parity check matrix H .
If C is an [n, k, d] binary code, then the extended code
C contains only even weight
vectors and is an [n + 1, k,
d] code, where
d equals d if d is even and equals d + 1 if d is
odd. This is consistent with the results obtained by extending H3 . In the nonbinary case,
however, whether or not
d is d or d + 1 is not so straightforward. For an [n, k, d] code
C over Fq , call the minimum weight of the even-like codewords, respectively the odd-like
codewords, the minimum even-like weight, respectively the minimum odd-like weight, of
the code. Denote the minimum even-like weight by de and the minimum odd-like weight
C has minimum weight
d = de . If do < de , then
by do . So d = min{de , do }. If de ≤ do , then
d = do + 1.
Example 1.5.4 Recall that the tetracode H3,2 from Example 1.3.3 is a [4, 2, 3] code over
F3 with generator matrix G and parity check matrix H given by
G=
1
0
0
1
1
1
1 −1
and
H=
−1
−1
−1
1
1 0
.
0 1
The codeword (1, 0, 1, 1) extends to (1, 0, 1, 1, 0) and the codeword (0, 1, 1, −1) extends
d = 3. The generator and parity check
to (0, 1, 1, −1, −1). Hence d = de = do = 3 and
16
Basic concepts of linear codes
3,2 are
matrices for H
= 1
G
0
0 1
1
1 1 −1
0
−1
and
1
1 1 1 1
= −1 −1 1 0 0 .
H
−1
1 0 1 0
If we extend a code and then puncture the new coordinate, we obtain the original code.
However, performing the operations in the other order will in general result in a different
code.
Example 1.5.5 If we puncture the binary code C with generator matrix
G=
1
0
1
0
0
1
0
1
1
0
on its last coordinate and then extend (on the right), the resulting code has generator matrix
G=
1
0
1
0
0
1
0
1
0
.
0
In this example, our last step was to extend a binary code with only even weight vectors.
The extended coordinate was always 0. In general, that is precisely what happens when you
extend a code that has only even-like codewords.
Exercise 27 Do the following.
(a) Let C = H3,2 be the [4, 2, 3] tetracode over F3 defined in Example 1.3.3 with generator
matrix
G=
1
0
0
1
1
1
1
.
−1
Give the generator matrix of the code obtained from C by puncturing on the right-most
coordinate and then extending on the right. Also determine the minimum weight of the
resulting code.
(b) Let C be a code over Fq . Let C 1 be the code obtained from C by puncturing on the
right-most coordinate and then extending this punctured code on the right. Prove that
C = C 1 if and only if C is an even-like code.
(c) With C 1 defined as in (b), prove that if C is self-orthogonal and contains the all-one
codeword 1, then C = C 1 .
(d) With C 1 defined as in (b), prove that C = C 1 if and only if the all-one vector 1 is
in C ⊥ .
1.5.3
Shortening codes
Let C be an [n, k, d] code over Fq and let T be any set of t coordinates. Consider the set
C(T ) of codewords which are 0 on T ; this set is a subcode of C. Puncturing C(T ) on T gives
a code over Fq of length n − t called the code shortened on T and denoted C T .
17
1.5 New codes from old
Example 1.5.6 Let C be the [6, 3, 2] binary code with generator matrix
1 0 0 1 1 1
G = 0 1 0 1 1 1 .
0 0 1 1 1 1
C ⊥ is also a [6, 3, 2] code with generator matrix
1 1 1 1 0 0
G ⊥ = 1 1 1 0 1 0 .
1 1 1 0 0 1
If the coordinates are labeled 1, 2, . . . , 6, let T = {5, 6}. Generator matrices for the shortened
code C T and punctured code C T are
1 0 0 1
1 0 1 0
GT =
and G T = 0 1 0 1 .
0 1 1 0
0 0 1 1
Shortening and puncturing the dual code gives the codes (C ⊥ )T and (C ⊥ )T , which have
generator matrices
(G ⊥ )T = [1
1
1
1 ] and
(G ⊥ )T =
1
1
1 1 1
.
1 1 0
From the generator matrices G T and G T , we find that the duals of C T and C T have generator
matrices
(G T )⊥ =
1
0
1
0
1
0
0
1
and
(G T )⊥ = [1
1 1 1].
Notice that these matrices show that (C ⊥ )T = (C T )⊥ and (C ⊥ )T = (C T )⊥ .
The conclusions observed in the previous example hold in general.
Theorem 1.5.7 Let C be an [n, k, d] code over Fq . Let T be a set of t coordinates. Then:
(i) (C ⊥ )T = (C T )⊥ and (C ⊥ )T = (C T )⊥ , and
(ii) if t < d, then C T and (C ⊥ )T have dimensions k and n − t − k, respectively;
(iii) if t = d and T is the set of coordinates where a minimum weight codeword is nonzero,
then C T and (C ⊥ )T have dimensions k − 1 and n − d − k + 1, respectively.
Proof: Let c be a codeword of C ⊥ which is 0 on T and c∗ the codeword with the coordinates in T removed. So c∗ ∈ (C ⊥ )T . If x ∈ C, then 0 = x · c = x∗ · c∗ , where x∗ is the
codeword x punctured on T . Thus (C ⊥ )T ⊆ (C T )⊥ . Any vector c ∈ (C T )⊥ can be extended
to a vector
c by inserting 0s in the positions of T . If x ∈ C, puncture x on T to obtain x∗ .
As 0 = x∗ · c = x ·
c, c ∈ (C ⊥ )T . Thus (C ⊥ )T = (C T )⊥ . Replacing C by C ⊥ gives (C ⊥ )T =
(C T )⊥ , completing (i).
Assume t < d. Then n − d + 1 ≤ n − t, implying any n − t coordinates of C contain
an information set by Theorem 1.4.15. Therefore C T must be k-dimensional and hence
(C ⊥ )T = (C T )⊥ has dimension n − t − k by (i); this proves (ii).
18
Basic concepts of linear codes
As in (ii), (iii) is completed if we show that C T has dimension k − 1. If S ⊂ T with S
of size d − 1, C S has dimension k by part (ii). Clearly C S has minimum distance 1 and C T
is obtained by puncturing C S on the nonzero coordinate of a weight 1 codeword in C S . By
Theorem 1.5.1(ii) C T has dimension k − 1.
Exercise 28 Let C be the binary repetition code of length n as described in Example 1.2.2.
Describe (C ⊥ )T and (C T )⊥ for any T .
Exercise 29 Let C be the code of length 6 in Example 1.4.4. Give generator matrices for
(C ⊥ )T and (C T )⊥ when T = {1, 2} and T = {1, 3}.
1.5.4
Direct sums
For i ∈ {1, 2} let C i be an [n i , ki , di ] code, both over the same finite field Fq . Then their
direct sum is the [n 1 + n 2 , k1 + k2 , min{d1 , d2 }] code
C 1 ⊕ C 2 = {(c1 , c2 ) | c1 ∈ C 1 ,c2 ∈ C 2 }.
If C i has generator matrix G i and parity check matrix Hi , then
G1 ⊕ G2 =
G1
O
O
G2
and
H1 ⊕ H2 =
H1
O
O
H2
(1.4)
are a generator matrix and parity check matrix for C 1 ⊕ C 2 .
Exercise 30 Let C i have generator matrix G i and parity check matrix Hi for i ∈ {1, 2}.
Prove that the generator and parity check matrices for C 1 ⊕ C 2 are as given in (1.4).
Exercise 31 Let C be the binary code with generator matrix
1 1 0 0 1 1 0
1 0 1 0 1 0 1
G = 1 0 0 1 1 1 0 .
1 0 1 0 1 1 0
1 0 0 1 0 1 1
Give another generator matrix for C that shows that C is a direct sum of two binary
codes.
Example 1.5.8 The [6, 3, 2] binary code C of Example 1.4.4 is the direct sum D ⊕ D ⊕ D
of the [2, 1, 2] code D = {00, 11}.
Since the minimum distance of the direct sum of two codes does not exceed the minimum
distance of either of the codes, the direct sum of two codes is generally of little use in
applications and is primarily of theoretical interest.
1.5.5
The (u | u + v) construction
Two codes of the same length can be combined to form a third code of twice the length
in a way similar to the direct sum construction. Let C i be an [n, ki , di ] code for i ∈ {1, 2},
19
1.6 Permutation equivalent codes
both over the same finite field Fq . The (u | u + v) construction produces the [2n, k1 +
k2 , min{2d1 , d2 }] code
C = {(u, u + v) | u ∈ C 1 ,v ∈ C 2 }.
If C i has generator matrix G i and parity check matrix Hi , then generator and parity check
matrices for C are
G1
O
G1
G2
and
H1
−H2
O
.
H2
(1.5)
Exercise 32 Prove that generator and parity check matrices for the code obtained in the
(u | u + v) construction from the codes C i are as given in (1.5).
Example 1.5.9 Consider the [8, 4, 4] binary code C with generator matrix
1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1
G=
0 0 1 1 0 0 1 1 .
0 0 0 0 1 1 1 1
Then C can be produced from the [4, 3, 2] code C 1 and the [4, 1, 4] code C 2 with generator
matrices
1 0 1 0
G 1 = 0 1 0 1 and G 2 = [1 1 1 1],
0 0 1 1
respectively, using the (u | u + v) construction. Notice that the code C 1 is also constructed
using the (u | u + v) construction from the [2, 2, 1] code C 3 and the [2, 1, 2] code C 4 with
generator matrices
G3 =
1
0
0
1
and
G 4 = [1
1],
respectively.
Unlike the direct sum construction of the previous section, the (u | u + v) construction
can produce codes that are important for reasons other than theoretical. For example, the
family of Reed–Muller codes can be constructed in this manner as we see in Section 1.10.
The code in the previous example is one of these codes.
Exercise 33 Prove that the (u | u + v) construction using [n, ki , di ] codes C i produces a
code of dimension k = k1 + k2 and minimum weight d = min{2d1 , d2 }.
1.6
Permutation equivalent codes
In this section and the next, we ask when two codes are “essentially the same.” We term
this concept “equivalence.” Often we are interested in properties of codes, such as weight
20
Basic concepts of linear codes
distribution, which remain unchanged when passing from one code to another that is essentially the same. Here we focus on the simplest form of equivalence, called permutation
equivalence, and generalize this concept in the next section.
One way to view codes as “essentially the same” is to consider them “the same” if they
are isomorphic as vector spaces. However, in that case the concept of weight, which we
will see is crucial to the study and use of codes, is lost: codewords of one weight may be
sent to codewords of a different weight by the isomorphism. A theorem of MacWilliams
[212], which we will examine in Section 7.9, states that a vector space isomorphism of two
binary codes of length n that preserves the weight of codewords (that is, send codewords
of one weight to codewords of the same weight) can be extended to an isomorphism of Fn2
that is a permutation of coordinates. Clearly any permutation of coordinates that sends one
code to another preserves the weight of codewords, regardless of the field. This leads to the
following natural definition of permutation equivalent codes.
Two linear codes C 1 and C 2 are permutation equivalent provided there is a permutation of
coordinates which sends C 1 to C 2 . This permutation can be described using a permutation
matrix, which is a square matrix with exactly one 1 in each row and column and 0s elsewhere.
Thus C 1 and C 2 are permutation equivalent provided there is a permutation matrix P such
that G 1 is a generator matrix of C 1 if and only if G 1 P is a generator matrix of C 2 . The effect
of applying P to a generator matrix is to rearrange the columns of the generator matrix.
If P is a permutation sending C 1 to C 2 , we will write C 1 P = C 2 , where C 1 P = {y | y =
xP for x ∈ C 1 }.
Exercise 34 Prove that if G 1 and G 2 are generator matrices for a code C of length n and
P is an n × n permutation matrix, then G 1 P and G 2 P are generator matrices for C P.
Exercise 35 Suppose C 1 and C 2 are permutation equivalent codes where C 1 P = C 2 for
some permutation matrix P. Prove that:
⊥
(a) C ⊥
1 P = C 2 , and
(b) if C 1 is self-dual, so is C 2 .
Example 1.6.1 Let C 1 , C 2 , and C 3 be binary codes with generator matrices
1 1 0 0 0 0
1 0 0 0 0 1
G 1 = 0 0 1 1 0 0 , G 2 = 0 0 1 1 0 0 , and
0 0 0 0 1 1
0 1 0 0 1 0
1
G 3 = 1
1
1
0
1
0
1
1
0
0
1
0
0
1
0
0 ,
1
respectively. All three codes have weight distribution A0 = A6 = 1 and A2 = A4 = 3. (See
Example 1.4.4 and Exercise 17.) The permutation switching columns 2 and 6 sends G 1
to G 2 , showing that C 1 and C 2 are permutation equivalent. Both C 1 and C 2 are self-dual,
consistent with (a) of Exercise 35. C 3 is not self-dual. Therefore C 1 and C 3 are not permutation equivalent by part (b) of Exercise 35.
21
1.6 Permutation equivalent codes
The next theorem shows that any code is permutation equivalent to one with generator
matrix in standard form.
Theorem 1.6.2 Let C be a linear code.
(i) C is permutation equivalent to a code which has generator matrix in standard form.
(ii) If I and R are information and redundancy positions, respectively, for C, then R and
I are information and redundancy positions, respectively, for the dual code C ⊥ .
Proof: For (i), apply elementary row operations to any generator matrix of C. This will
produce a new generator matrix of C which has columns the same as those in Ik , but possibly
in a different order. Now choose a permutation of the columns of the new generator matrix
so that these columns are moved to the order that produces [Ik | A]. The code generated by
[Ik | A] is equivalent to C.
If I is an information set for C, then by row reducing a generator matrix for C, we
obtain columns in the information positions that are the columns of Ik in some order. As
above, choose a permutation matrix P to move the columns so that C P has generator matrix
[Ik | A]; P has moved I to the first k coordinate positions. By Theorem 1.2.1, (C P)⊥ has the
last n − k coordinates as information positions. By Exercise 35, (C P)⊥ = C ⊥ P, implying
that R is a set of information positions for C ⊥ , proving (ii).
It is often more convenient to use permutations (in cycle form) rather than permutation
matrices to express equivalence. Let Symn be the set of all permutations of the set of n
coordinates. If σ ∈ Symn and x = x1 x2 · · · xn , define
xσ = y1 y2 · · · yn ,
where y j = x jσ −1 for 1 ≤ j ≤ n.
So xσ = xP, where P = [ pi, j ] is the permutation matrix given by
1 if j = iσ,
pi, j =
0 otherwise.
(1.6)
This is illustrated in the next example.
Example 1.6.3 Let n = 3, x = x1 x2 x3 , and σ = (1, 2, 3). Then 1σ −1 = 3, 2σ −1 = 1, and
3σ −1 = 2. So xσ = x3 x1 x2 . Let
0 1 0
P = 0 0 1 .
1 0 0
Then xP also equals x3 x1 x2 .
Exercise 36 If σ, τ ∈ Symn , show that x(σ τ ) = (xσ )τ .
Exercise 37 Let S be the set of all codes over Fq of length n. Let C 1 , C 2 ∈ S. Define
C 1 ∼ C 2 to mean that there exists an n × n permutation matrix P such that C 1 P = C 2 .
Prove that ∼ is an equivalence relation on S. Recall that ∼ is an equivalence relation on a
22
Basic concepts of linear codes
set S if the following three conditions are fulfilled:
(i) (reflexive) C ∼ C for all C ∈ S,
(ii) (symmetric) if C 1 ∼ C 2 , then C 2 ∼ C 1 , and
(iii) (transitive) if C 1 ∼ C 2 and C 2 ∼ C 3 , then C 1 ∼ C 3 .
The set of coordinate permutations that map a code C to itself forms a group, that is, a
set with an associative binary operation which has an identity and where all elements have
inverses, called the permutation automorphism group of C. This group is denoted by PAut(C).
So if C is a code of length n, then PAut(C) is a subgroup of the symmetric group Symn .
Exercise 38 Show that if C is the [n, 1] binary repetition code of Example 1.2.2, then
PAut(C) = Symn .
Exercise 39 Show that (1, 2)(5, 6), (1, 2, 3)(5, 6, 7), and (1, 2, 4, 5, 7, 3, 6) are automorphisms of the [7, 4] binary code H3 given in Example 1.2.3. These three permutations
generate a group of order 168 called the projective special linear group PSL2 (7). This is in
fact the permutation automorphism group of H3 .
Knowledge of the permutation automorphism group of a code can give important theoretical and practical information about the code. While these groups for some codes have been
determined, they are in general difficult to find. The following result shows the relationship
between the permutation automorphism group of a code and that of its dual; it also establishes
the connection between automorphism groups of permutation equivalent codes. Its proof is
left to the reader.
Theorem 1.6.4 Let C, C 1 , and C 2 be codes over Fq . Then:
(i) PAut(C) = PAut(C ⊥ ),
(ii) if q = 4, PAut(C) = PAut(C ⊥ H ), and
(iii) if C 1 P = C 2 for a permutation matrix P, then P −1 PAut(C 1 )P = PAut(C 2 ).
Exercise 40 Prove Theorem 1.6.4.
One can prove that if two codes are permutation equivalent, so are their extensions; see
Exercise 41. This is not necessarily the case for punctured codes.
Exercise 41 Prove that if C 1 and C 2 are permutation equivalent codes, then so are
C 1 and
C2.
Example 1.6.5 Let C be the binary code with generator matrix
G=
1
0
1
0
0
1
0
1
0
.
1
Let C ∗1 and C ∗5 be C punctured on coordinate 1 and 5, respectively. Then C ∗5 has only
even weight vectors, while C ∗1 has odd weight codewords. Thus although C is certainly
permutation equivalent to itself, C ∗1 and C ∗5 are not permutation equivalent.
23
1.7 More general equivalence of codes
In some instances, the group PAut(C) is transitive as a permutation group; thus for every
ordered pair (i, j) of coordinates, there is a permutation in PAut(C) which sends coordinate
i to coordinate j. When PAut(C) is transitive, we have information about the structure of
its punctured codes. When PAut(
C) is transitive, we have information about the minimum
weight of C.
Theorem 1.6.6 Let C be an [n, k, d] code.
(i) Suppose that PAut(C) is transitive. Then the n codes obtained from C by puncturing C
on a coordinate are permutation equivalent.
(ii) Suppose that PAut(
C) is transitive. Then the minimum weight d of C is its minimum odd-like weight do . Furthermore, every minimum weight codeword of C is oddlike.
Proof: The proof of assertion (i) is left to the reader in Exercise 42. Now assume that
PAut(
C) is transitive. Applying (i) to
C we conclude that puncturing
C on any coordinate
gives a code permutation equivalent to C. Let c be a minimum weight vector of C and
assume that c is even-like. Then wt(
c) = d, where
c ∈
C is the extended vector. Puncturing
C on a coordinate where c is nonzero gives a vector of weight d − 1 in a code permutation
equivalent to C, a contradiction.
Exercise 42 Prove Theorem 1.6.6(i).
Exercise 43 Let C be the code of Example 1.4.4.
(a) Is PAut(C) transitive?
(b) Find generator matrices for all six codes punctured on one point. Which of these punctured codes are equivalent?
(c) Find generator matrices for all 15 codes punctured on two points. Which of these
punctured codes are equivalent?
Exercise 44 Let C = C 1 ⊕ C 2 , where C 1 and C 2 are of length n 1 and n 2 , respectively. Prove
that
PAut(C 1 ) × PAut(C 2 ) ⊆ PAut(C),
where PAut(C 1 ) × PAut(C 2 ) is the direct product of the groups PAut(C 1 ) (acting on the first
n 1 coordinates of C) and PAut(C 2 ) (acting on the last n 2 coordinates of C).
For binary codes, the notion of permutation equivalence is the most general form
of equivalence. However, for codes over other fields, other forms of equivalence are
possible.
1.7
More general equivalence of codes
When considering codes over fields other than F2 , equivalence takes a more general form.
For these codes there are other maps which preserve the weight of codewords. These
24
Basic concepts of linear codes
maps include those which rescale coordinates and those which are induced from field
automorphisms (a topic we study more extensively in Chapter 3). We take up these maps
one at a time.
First, recall that a monomial matrix is a square matrix with exactly one nonzero entry in
each row and column. A monomial matrix M can be written either in the form D P or the
form P D1 , where D and D1 are diagonal matrices and P is a permutation matrix.
Example 1.7.1 The monomial matrix
0 a 0
M = 0 0 b
c 0 0
equals
a
D P = 0
0
0
b
0
0
0
0 0
c
1
1
0
0
0
0
1 = P D 1 = 0
0
1
1 0
c 0
0 1 0 a
0 0
0 0
0
0 .
b
We will generally choose the form M = D P for representing monomial matrices; D is
called the diagonal part of M and P is the permutation part. This notation allows a more
compact form using (1.6), as we now illustrate.
Example 1.7.2 The monomial matrix M = D P of Example 1.7.1 can be written
diag(a, b, c)(1, 2, 3),
where diag(a, b, c) is the diagonal matrix D and (1, 2, 3) is the permutation matrix P written
in cycle form.
We will apply monomial maps M = D P on the right of row vectors x in the manner of the
next example.
Example 1.7.3 Let M = diag(a, b, c)(1, 2, 3) be the monomial map of Example 1.7.2 and
x = x1 x2 x3 = (x1 , x2 , x3 ). Then
xM = xD P = (ax1 , bx2 , cx3 )P = (cx3 , ax1 , bx2 ).
This example illustrates the more general principle of how to apply M = D P to a vector x
where σ is the permutation (in cycle form) associated to P. For all i:
r first, multiply the ith component of x by the ith diagonal entry of D, and
r second, move this product to coordinate position iσ .
With this concept of monomial maps, we now are ready to define monomial equivalence.
Let C 1 and C 2 be codes of the same length over a field Fq , and let G 1 be a generator matrix
for C 1 . Then C 1 and C 2 are monomially equivalent provided there is a monomial matrix M
so that G 1 M is a generator matrix of C 2 . More simply, C 1 and C 2 are monomially equivalent
if there is a monomial map M such that C 2 = C 1 M. Monomial equivalence and permutation
equivalence are precisely the same for binary codes.
25
Y
L
1.7 More general equivalence of codes
F
Exercise 45 Let S be the set of all codes over Fq of length n. Let C 1 , C 2 ∈ S. Define
C 1 ∼ C 2 to mean that there exists an n × n monomial matrix M such that C 1 M = C 2 . Prove
that ∼ is an equivalence relation on S. (The definition of “equivalence relation” is given
in Exercise 37.)
T
m
a
e
There is one more type of map that we need to consider: that arising from automorphisms
of the field Fq , called Galois automorphisms. We will apply this in conjunction with
monomial maps. (We will apply field automorphisms on the right of field elements since
we are applying matrices on the right of vectors.) If γ is a field automorphism of Fq and
M = D P is a monomial map with entries in Fq , then applying the map Mγ to a vector x
is described by the following process, where again σ is the permutation associated to the
matrix P. For all i:
r first, multiply the ith component of x by the ith diagonal entry of D,
r second, move this product to coordinate position iσ , and
r third, apply γ to this component.
Example 1.7.4 The field F4 has automorphism γ given by xγ = x 2 . If M = D P =
diag(a, b, c)(1, 2, 3) is the monomial map of Example 1.7.2 and x = x1 x2 x3 = (x1 , x2 , x3 ) ∈
F34 , then
xMγ = (ax1 , bx2 , cx3 )Pγ = (cx3 , ax1 , bx2 )γ = ((cx3 )2 , (ax1 )2 , (bx2 )2 ).
For instance,
(1, ω, 0)diag(ω, ω, 1)(1, 2, 3)γ = (0, ω, 1).
We say that two codes C 1 and C 2 of the same length over Fq are equivalent provided
there is a monomial matrix M and an automorphism γ of the field such that C 2 = C 1 Mγ .
This is the most general notion of equivalence that we will consider. Thus we have three
notions of when codes are the “same”: permutation equivalence, monomial equivalence,
and equivalence. All three are the same if the codes are binary; monomial equivalence
and equivalence are the same if the field considered has a prime number of elements.
The fact that these are the appropriate maps to consider for equivalence is a consequence of a theorem by MacWilliams [212] regarding weight preserving maps discussed in
Section 7.9.
Two equivalent codes have the same weight distribution. However, two codes with the
same weight distribution need not be equivalent as Example 1.6.1 shows. Exercise 35 shows
that if C 1 and C 2 are permutation equivalent codes, then so are their duals under the same
⊥
map. However, if C 1 M = C 2 , it is not necessarily the case that C ⊥
1 M = C2 .
Example 1.7.5 Let C 1 and C 2 be [2, 1, 2] codes over F4 with generator matrices [1 1]
⊥
and [1 ω], respectively. Then the duals C ⊥
1 and C 2 under the ordinary inner product
have generator matrices [1 1] and [1 ω], respectively. Notice that C 1 diag(1, ω) = C 2 , but
⊥
C⊥
1 diag(1, ω) = C 2 .
In the above example, C 1 is self-dual but C 2 is not. Thus equivalence may not preserve self-duality. However, the following theorem is valid, and its proof is left as an
exercise.
26
Basic concepts of linear codes
Theorem 1.7.6 Let C be a code over Fq . The following hold:
(i) If M is a monomial matrix with entries only from {0, −1, 1}, then C is self-dual if and
only if C M is self-dual.
(ii) If q = 3 and C is equivalent to C 1 , then C is self-dual if and only if C 1 is self-dual.
(iii) If q = 4 and C is equivalent to C 1 , then C is Hermitian self-dual if and only if C 1 is
Hermitian self-dual.
As there are three versions of equivalence, there are three possible automorphism groups.
Let C be a code over Fq . We defined the permutation automorphism group PAut(C) of C in
the last section. The set of monomial matrices that map C to itself forms the group MAut(C)
called the monomial automorphism group of C. Finally, the set of maps of the form Mγ ,
where M is a monomial matrix and γ is a field automorphism, that map C to itself forms
the group ŴAut(C), called automorphism group of C.3 That MAut(C) and ŴAut(C) are
groups is left as an exercise. In the binary case all three groups are identical. If q is a prime,
MAut(C) = ŴAut(C). In general, PAut(C) ⊆ MAut(C) ⊆ ŴAut(C).
Exercise 46 For 1 ≤ i ≤ 3 let Di be diagonal matrices, Pi permutation matrices, and γi
automorphisms of Fq .
(a) You can write (D1 P1 γ1 )(D2 P2 γ2 ) in the form D3 P3 γ3 . Find D3 , P3 , and γ3 in terms of
D1 , D2 , P1 , P2 , γ1 , and γ2 .
(b) You can write (D1 P1 γ1 )−1 in the form D2 P2 γ2 . Find D2 , P2 , and γ2 in terms of D1 , P1 ,
and γ1 .
Exercise 47 Let S be the set of all codes over Fq of length n. Let C 1 , C 2 ∈ S. Define C 1 ∼ C 2 to mean that there exists an n × n monomial matrix M and an automorphism γ of Fq such that C 1 Mγ = C 2 . Prove that ∼ is an equivalence relation on S. (The
definition of “equivalence relation” is given in Exercise 37.) You may find Exercise 46
helpful.
Exercise 48 Prove that MAut(C) and ŴAut(C) are groups. (Hint: Use Exercise 46.)
Example 1.7.7 Let C be the tetracode with generator matrix as in Example 1.3.3. Labeling the coordinates by {1, 2, 3, 4}, PAut(C) is the group of order 3 generated by
the permutation (1, 3, 4). MAut(C) = ŴAut(C) is a group of order 48 generated by
diag(1, 1, 1, −1)(1, 2, 3, 4) and diag(1, 1, 1, −1)(1, 2).
Exercise 49 Let C be the tetracode of Examples 1.3.3 and 1.7.7.
(a) Verify that the maps listed in Example 1.7.7 are indeed automorphisms of the tetracode.
(b) Write the generator (1, 3, 4) of PAut(C) as a product of the two generators given for
MAut(C) in Example 1.7.7.
(c) (Hard) Prove that the groups PAut(C) and ŴAut(C) are as claimed in Example
1.7.7.
3
The notation for automorphism groups is not uniform in the literature. For example, G(C) or Aut(C) are sometimes
used for one of the automorphism groups of C. As a result of this, we avoid both of these notations.
27
1.7 More general equivalence of codes
Example 1.7.8
1 ω
G ′6 = 0 1
0 0
Let G ′6 be the generator matrix of a [6, 3] code G ′6 over F4 , where
1 0 0 ω
ω 1 0 ω .
1 ω 1 ω
Label the columns {1, 2, 3, 4, 5, 6}. If this generator matrix is row reduced, we obtain the
matrix
1 0 0 1 ω ω
0 1 0 ω ω 1 .
0 0 1 ω 1 ω
Swapping columns 5 and 6 gives the generator matrix G 6 of Example 1.3.4; thus G ′6 (5, 6) =
G 6 . (The codes are equivalent and both are called the hexacode.) Using group theoretic
arguments, one can verify the following information about the three automorphism groups
of G ′6 ; however, one can also use algebraic systems such as Magma or Gap to carry out this
verification: PAut(G ′6 ) is a group of order 60 generated by the permutations (1, 2, 6)(3, 5, 4)
and (1, 2, 3, 4, 5). MAut(G ′6 ) is a group of order 3 · 360 generated by the monomial map
diag(ω, 1, 1, ω, ω, ω)(1, 2, 6) and the permutation (1, 2, 3, 4, 5).4 Finally, ŴAut(G ′6 ) is a
group twice as big as MAut(G ′6 ) generated by MAut(G ′6 ) and diag(1, ω, ω, ω, ω, 1)(1, 6)γ ,
where γ is the automorphism of F4 given by xγ = x 2 .
Exercise 50 Let G ′6 be the hexacode of Example 1.7.8.
(a) Verify that (1, 2, 6)(3, 5, 4) and (1, 2, 3, 4, 5) are elements of PAut(G ′6 ).
(b) Verify that diag(ω, 1, 1, ω, ω, ω)(1, 2, 6) is an element of MAut(G ′6 ).
(c) Verify that diag(1, ω, ω, ω, ω, 1)(1, 6)γ is an element of ŴAut(G ′6 ).
Recall that PAut(C ⊥ ) = PAut(C), by Theorem 1.6.4. One can find MAut(C ⊥ ) and
ŴAut(C ⊥ ) from MAut(C) and ŴAut(C) although the statement is not so simple.
Theorem 1.7.9 Let C be a code over Fq . Then:
(i) MAut(C ⊥ ) = {D −1 P | D P ∈ MAut(C)}, and
(ii) ŴAut(C ⊥ ) = {D −1 Pγ | D Pγ ∈ ŴAut(C)}.
In the case of codes over F4 , PAut(C ⊥ H ) = PAut(C) by Theorem 1.6.4; the following extends
this in the nicest possible fashion.
Theorem 1.7.10 Let C be a code over F4 . Then:
(i) MAut(C ⊥ H ) = MAut(C), and
(ii) ŴAut(C ⊥ H ) = ŴAut(C).
The third part of Theorem 1.6.4 is generalized as follows.
4
This group is isomorphic to the nonsplitting central extension of the cyclic group of order 3 by the alternating
group on six points.
28
Basic concepts of linear codes
Theorem 1.7.11 Let C 1 and C 2 be codes over Fq . Let P be a permutation matrix, M a
monomial matrix, and γ an automorphism of Fq .
(i) If C 1 P = C 2 , then P −1 PAut(C 1 )P = PAut(C 2 ).
(ii) If C 1 M = C 2 , then M −1 MAut(C 1 )M = MAut(C 2 ).
(iii) If C 1 Mγ = C 2 , then (Mγ )−1 ŴAut(C 1 )Mγ = ŴAut(C 2 ).
Exercise 51 Prove Theorems 1.7.9, 1.7.10, and 1.7.11.
Exercise 52 Using Theorems 1.6.4 and 1.7.11 give generators of PAut(G 6 ), MAut(G 6 ),
and ŴAut(G 6 ) from the information given in Example 1.7.8.
As with PAut(C), we can speak of transitivity of the automorphism groups MAut(C) or
ŴAut(C). To do this we consider only the permutation parts of the maps in these groups.
Specifically, define MAutPr (C) to be the set {P | D P ∈ MAut(C)} and ŴAutPr (C) to be
{P | D Pγ ∈ ŴAut(C)}. (The subscript Pr stands for projection. The groups MAut(C) and
ŴAut(C) are semi-direct products; the groups MAutPr (C) and ŴAutPr (C) are obtained from
MAut(C) and ŴAut(C) by projecting onto the permutation part of the semi-direct product.) For instance, in Example 1.7.7, MAutPr (C) = ŴAutPr (C) = Sym4 ; in Example 1.7.8,
MAutPr (C) is the alternating group on six points and ŴAutPr (C) = Sym6 . We leave the proof
of the following theorem as an exercise.
Theorem 1.7.12 Let C be a linear code over Fq . Then:
(i) MAutPr (C) and ŴAutPr (C) are subgroups of the symmetric group Symn , and
(ii) PAut(C) ⊆ MAutPr (C) ⊆ ŴAutPr (C).
Exercise 53 Prove Theorem 1.7.12. (Hint: Use Exercise 46.)
We now say that MAut(C) (ŴAut(C), respectively) is transitive as a permutation group
if MAutPr (C) (ŴAutPr (C), respectively) is transitive. The following is a generalization of
Theorem 1.6.6.
Theorem 1.7.13 Let C be an [n, k, d] code.
(i) Suppose that MAut(C) is transitive. Then the n codes obtained from C by puncturing
C on a coordinate are monomially equivalent.
(ii) Suppose that ŴAut(C) is transitive. Then the n codes obtained from C by puncturing C
on a coordinate are equivalent.
(iii) Suppose that either MAut(
C) or ŴAut(
C) is transitive. Then the minimum weight d of
C is its minimum odd-like weight do . Furthermore, every minimum weight codeword
of C is odd-like.
Exercise 54 Prove Theorem 1.7.13.
29
1.8 Hamming codes
1.8
Hamming codes
We now generalize the binary code H3 of Example 1.2.3. The parity check matrix obtained
in that example was
0 1 1 1 1 0 0
H = [AT | I3 ] = 1 0 1 1 0 1 0 .
1 1 0 1 0 0 1
Notice that the columns of this parity check matrix are all the distinct nonzero binary
columns of length 3. So H3 is equivalent to the code with parity check matrix
0 0 0 1 1 1 1
H ′ = 0 1 1 0 0 1 1
1 0 1 0 1 0 1
whose columns are the numbers 1 through 7 written as binary numerals (with leading 0s as
necessary to have a 3-tuple) in their natural order.
This form generalizes easily. Let n = 2r − 1, with r ≥ 2. Then the r × (2r − 1) matrix
Hr whose columns, in order, are the numbers 1, 2, . . . , 2r − 1 written as binary numerals,
is the parity check matrix of an [n = 2r − 1, k = n − r ] binary code. Any rearrangement
of columns of Hr gives an equivalent code, and hence any one of these equivalent codes
will be called the binary Hamming code of length n = 2r − 1 and denoted by either Hr or
H2,r . It is customary when naming a code, such as the Hamming code, the tetracode, or the
hexacode, to identify equivalent codes. We will follow this practice with these and other
codes as well.
Since the columns of Hr are distinct and nonzero, the minimum distance is at least 3 by
Corollary 1.4.14. Since the columns corresponding to the numbers 1, 2, and 3 are linearly
dependent, the minimum distance equals 3, by the same corollary. Thus Hr is a binary
[2r − 1, 2r − 1 − r, 3] code. In the following sense, these codes are unique.
Theorem 1.8.1 Any [2r − 1, 2r − 1 − r, 3] binary code is equivalent to the binary Hamming code Hr .
Exercise 55 Prove Theorem 1.8.1.
Exercise 56 Prove that every [8, 4, 4] binary code is equivalent to the extended Hamming
3 . (So for that reason, we say that the [8, 4, 4] binary code is unique.)
code H
Similarly, Hamming codes Hq,r can be defined over an arbitrary finite field Fq . For r ≥ 2,
Hq,r has parity check matrix Hq,r defined by choosing for its columns a nonzero vector
from each 1-dimensional subspace of Frq . (Alternately, these columns are the points of the
projective geometry PG(r − 1, q).) There are (q r − 1)/(q − 1) 1-dimensional subspaces.
Therefore Hq,r has length n = (q r − 1)/(q − 1), dimension n − r , and redundancy r . As
no two columns are multiples of each other, Hq,r has minimum weight at least 3. Adding
30
Basic concepts of linear codes
two nonzero vectors from two different 1-dimensional subspaces gives a nonzero vector
from yet a third 1-dimensional space; hence Hq,r has minimum weight 3. When q = 2,
H2,r is precisely the code Hr .
Suppose you begin with one particular order of the 1-dimensional subspaces and one
particular choice for representatives for those subspaces to form the parity check matrix
′
by choosing a different
Hq,r for Hq,r . If you choose a different parity check matrix Hq,r
order for the list of subspaces and choosing different representatives from these subspaces,
′
can be obtained from Hq,r by rescaling and reordering the columns – precisely what
Hq,r
is accomplished by multiplying Hq,r on the right by some monomial matrix. So any code
you get in the above manner is monomially equivalent to any other code obtained in the
same manner. Again Hq,r will therefore refer to any code in the equivalence class. As in
the binary case, these codes are unique, up to equivalence.
Theorem 1.8.2 Any [(q r − 1)/(q − 1), (q r − 1)/(q − 1) − r, 3] code over Fq is monomially equivalent to the Hamming code Hq,r .
Exercise 57 Prove Theorem 1.8.2.
Exercise 58 Prove that the tetracode of Example 1.3.3 was appropriately denoted in that
example as H3,2 . In other words, show that the tetracode is indeed a Hamming code.
The duals of the Hamming codes are called simplex codes. They are [(q r − 1)/(q − 1), r ]
codes whose codeword weights have a rather interesting property. The simplex code H⊥
3 has
only nonzero codewords of weight 4 (see Example 1.2.3). The tetracode, being a self-dual
Hamming code, is a simplex code; its nonzero codewords all have weight 3. In general, we
have the following, which will be proved as part of Theorem 2.7.5.
Theorem 1.8.3 The nonzero codewords of the [(q r − 1)/(q − 1), r ] simplex code over Fq
all have weights q r −1 .
We now give a construction of the binary simplex codes and prove Theorem 1.8.3 in
this case. These codes are produced by a modification of the (u | u + v) construction of
Section 1.5.5.
Let G 2 be the matrix
G2 =
0
1
1
0
1
.
1
For r ≥ 3, define G r inductively by
0···0
1 1···1
0
Gr =
..
.
G r −1
. G r −1
0
We claim the code S r generated by G r is the dual of Hr . Clearly, G r has one more
row than G r −1 and, as G 2 has 2 rows, G r has r rows. Let G r have n r columns. So n 2 =
22 − 1 and n r = 2n r −1 + 1; by induction n r = 2r − 1. The columns of G 2 are nonzero and
distinct; clearly by construction, the columns of G r are nonzero and distinct if the columns
31
1.9 The Golay codes
of G r −1 are also nonzero and distinct. So by induction G r has 2r − 1 distinct nonzero
columns of length r . But there are only 2r − 1 possible distinct nonzero r -tuples; these
are the binary expansions of 1, 2, . . . , 2r − 1. (In fact, the columns are in this order.) So
S r = Hr⊥ .
The nonzero codewords of S 2 have weight 2. Assume the nonzero codewords of S r −1 have
weight 2r −2 . Then the nonzero codewords of the subcode generated by the last r − 1 rows of
G r have the form (a, 0, b), where a, b ∈ S r −1 . So these codewords have weight 2 · 2r −2 =
2r −1 . Also the top row of G r has weight 1 + 2r −1 − 1 = 2r −1 . The remaining nonzero
codewords of S r have the form (a, 1, b + 1), where a, b ∈ S r −1 . As wt(b + 1) = 2r −2 −
1, wt(a, 1, b + 1) = 2r −2 + 1 + 2r −2 − 1 = 2r −1 . Thus by induction S r has all nonzero
codewords of weight 2r −1 .
1.9
The Golay codes
In this section we define four codes that are called the Golay codes. The first two are binary
and the last two are ternary. In the binary case, the shorter of the codes is obtained from the
longer by puncturing and the longer from the shorter by extending. The same holds in the
ternary case if the generator matrix is chosen in the right form. (See Exercise 61.) Although
the hexacode G 6 of Example 1.3.4 and the punctured code G ∗6 are technically not Golay
codes, they have so many properties similar to the binary and ternary Golay codes, they are
often referred to as the Golay codes over F4 . These codes have had an exceptional place in
the history of coding theory. The binary code of length 23 and the ternary code of length
11 were first described by M. J. E. Golay in 1949 [102].
1.9.1
The binary Golay codes
We let G 24 be the [24, 12] code
where
0 1 1 1 1 1 1
1 1 1 0 1 1 1
1 1 0 1 1 1 0
1 0 1 1 1 0 0
1 1 1 1 0 0 0
1 1 1 0 0 0 1
A=
1 1 0 0 0 1 0
1 0 0 0 1 0 1
1 0 0 1 0 1 1
1 0 1 0 1 1 0
1 1 0 1 1 0 1
1 0 1 1 0 1 1
with generator matrix G 24 = [I12 | A] in standard form,
1
0
0
0
1
0
1
1
0
1
1
1
1
0
0
1
0
1
1
0
1
1
1
0
1
0
1
0
1
1
0
1
1
1
0
0
1
1
0
1
1
0
1
1
1
0
0
0
1
0
1
1
0
1
.
1
1
0
0
0
1
Notice how A is constructed. The matrix A is an example of a bordered reverse circulant
matrix. Label the columns of A by ∞, 0, 1, 2, . . . , 10. The first row contains 0 in column ∞
32
Basic concepts of linear codes
and 1 elsewhere. To obtain the second row, a 1 is placed in column ∞ and a 1 is placed in
columns 0, 1, 3, 4, 5, and 9; these numbers are precisely the squares of the integers modulo
11. That is 02 = 0, 12 ≡ 102 ≡ 1 (mod 11), 22 ≡ 92 ≡ 4 (mod 11), etc. The third row of A
is obtained by putting a 1 in column ∞ and then shifting the components in the second row
one place to the left and wrapping the entry in column 0 around to column 10. The fourth
row is obtained from the third in the same manner, as are the remaining rows.
We give some elementary properties of G 24 . For ease of notation, let A1 be the 11 × 11
reverse circulant matrix obtained from A by deleting row one and column ∞. Note first that
the rows of G 24 have weights 8 and 12. In particular the inner product of any row of G 24
with itself is 0. The inner product of row one with any other row is also 0 as each row of A1
has weight 6. To find the inner product of any row below the first with any other row below
the first, by the circulant nature of A1 , we can shift both rows so that one of them is row two.
(For example, the inner product of row four with row seven is the same as the inner product
of row two with row five.) The inner product of row two with any row below it is 0 by
direct inspection. Therefore G 24 is self-dual with all rows in the generator matrix of weight
divisible by four. By Theorem 1.4.8(i), all codewords of G 24 have weights divisible by
four.
Thus G 24 is a [24, 12, d] self-dual code, with d = 4 or 8. Suppose d = 4. Notice that
T
A = A. As G 24 is self-dual, by Theorem 1.2.1, [ AT | I12 ] = [A | I12 ] is also a generator
matrix. Hence if (a, b) is a codeword of G 24 , where a, b ∈ F12
2 , so is (b, a). Then if c = (a, b)
is a codeword of G 24 of weight 4, we may assume wt(a) ≤ wt(b). If wt(a) = 0, a = 0 and
as G 24 is in standard form, b = 0, which is a contradiction. If wt(a) = 1, then c is one of the
rows of G 24 , which is also a contradiction. Finally, if wt(a) = 2, then c is the sum of two
rows of G 24 . The same shifting argument as earlier shows that the weight of c is the same as
the weight of a codeword that is the sum of row two of G 24 and another row. By inspection,
none of these sums contributes exactly two to the weight of the right 12 components. So
d = 8.
If we puncture in any of the coordinates, we obtain a [23, 12, 7] binary code G 23 . It turns
out, as we will see later, that all these punctured codes are equivalent. By Exercise 59,
adding an overall parity check to one of these punctured codes (in the same position which
had been punctured) gives exactly the same G 24 back. In the future, any code equivalent to
G 23 will be called the binary Golay code and any code equivalent to G 24 will be called the
extended binary Golay code. The codes G 23 and G 24 have amazing properties and a variety
of constructions, as we will see throughout this book.
Exercise 59 Prove that if G 24 is punctured in any coordinate and the resulting code
is extended in the same position, exactly the same code G 24 is obtained. Hint: See
Exercise 27.
1.9.2
The ternary Golay codes
The ternary code G 12 is the [12, 6] code over F3 with generator matrix G 12 = [I6 | A] in
standard form, where
33
1.10 Reed–Muller codes
0
1
1
A=
1
1
1
1
1
1
1
1
0
1 −1 −1
1
1
0
1 −1 −1
.
−1
1
0
1 −1
−1 −1
1
0
1
1 −1 −1
1
0
In a fashion analogous to that of Section 1.9.1, we can show that G 12 is a [12, 6, 6]
self-dual code. The code G 11 is a [11, 6, 5] code obtained from G 12 by puncturing. Again,
equivalent codes are obtained regardless of the coordinate. However, adding an overall
parity check to G 11 in the same coordinate may not give the same G 12 back; it will give
either a [12, 6, 6] code or a [12, 6, 5] code depending upon the coordinate; see Exercise 61.
Exercise 60 Prove that G 12 is a [12, 6, 6] self-dual ternary code.
Exercise 61 Number the columns of the matrix A used to generate G 12 by ∞, 0, 1, 2, 3,
4. Let G ′12 be obtained from G 12 by scaling column ∞ by −1.
(a) Show how to give the entries in row two of A using squares and non-squares of integers
modulo 5.
(b) Why is G ′12 a [12, 6, 6] self-dual code? Hint: Use Exercise 60.
(c) Show that puncturing G ′12 in any coordinate and adding back an overall parity check in
that same position gives the same code G ′12 .
(d) Show that if G 12 is punctured in coordinate ∞ and this code is then extended in the
same position, the resulting code is G ′12 .
(e) Show that if G 12 is punctured in any coordinate other than ∞ and this code is then
extended in the same position, the resulting code is a [12, 6, 5] code.
In Exercise 61, we scaled the first column of A by −1 to obtain a [12, 6, 6] self-dual code
G ′12 equivalent to G 12 . By that exercise, if we puncture G ′12 in any coordinate and then extend
in the same coordinate, we get G ′12 back. In Chapter 10 we will see that these punctured
codes are all equivalent to each other and to G 11 . As a result any [11, 6, 5] code equivalent
to one obtained by puncturing G ′12 in any coordinate will be called the ternary Golay code;
any [12, 6, 6] code equivalent to G ′12 (or G 12 ) will be called the extended ternary Golay
code.
1.10
Reed–Muller codes
In this section, we introduce the binary Reed–Muller codes. Nonbinary generalized Reed–
Muller codes will be examined in Section 13.2.3. The binary codes were first constructed
and explored by Muller [241] in 1954, and a majority logic decoding algorithm for them was
described by Reed [293] also in 1954. Although their minimum distance is relatively small,
they are of practical importance because of the ease with which they can be implemented
and decoded. They are of mathematical interest because of their connection with finite affine
34
Basic concepts of linear codes
and projective geometries; see [4, 5]. These codes can be defined in several different ways.
Here we choose a recursive definition based on the (u | u + v) construction.
Let m be a positive integer and r a nonnegative integer with r ≤ m. The binary codes
we construct will have length 2m . For each length there will be m + 1 linear codes, denoted
R(r, m) and called the r th order Reed–Muller, or RM, code of length 2m . The codes R(0, m)
and R(m, m) are trivial codes: the 0th order RM code R(0, m) is the binary repetition code
m
of length 2m with basis {1}, and the mth order RM code R(m, m) is the entire space F22 .
For 1 ≤ r < m, define
R(r, m) = {(u, u + v) | u ∈ R(r, m − 1),v ∈ R(r − 1, m − 1)}.
(1.7)
Let G(0, m) = [11 · · · 1] and G(m, m) = I2m . From the above description, these are
generator matrices for R(0, m) and R(m, m), respectively. For 1 ≤ r < m, using (1.5),
a generator matrix G(r, m) for R(r, m) is
G(r, m) =
G(r, m − 1)
O
G(r, m − 1)
.
G(r − 1, m − 1)
We illustrate this construction by producing
1 ≤ r < m ≤ 3:
1 0
1 0 1 0
0 1
G(1, 2) = 0 1 0 1 , G(1, 3) =
0 0
0 0 1 1
0 0
1
0
0
G(2, 3) = 0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
1
0
1
0
0
0
1 .
0
1
1
the generator matrices for R(r, m) with
1
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
1
1
0
1
, and
1
1
From these matrices, notice that R(1, 2) and R(2, 3) are both the set of all even weight
vectors in F42 and F82 , respectively. Notice also that R(1, 3) is an [8, 4, 4] self-dual code,
3 by Exercise 56.
which must be H
The dimension, minimum weight, and duals of the binary Reed–Muller codes can be
computed directly from their definitions.
Theorem 1.10.1 Let r be an integer with 0 ≤ r ≤ m. Then the following hold:
(i) R(i, m) ⊆ R( j, m), if 0 ≤ i ≤ j ≤ m.
(ii) The dimension of R(r, m) equals
m
m
m
+
+ ··· +
.
0
1
r
(iii) The minimum weight of R(r, m) equals 2m−r .
(iv) R(m, m)⊥ = {0}, and if 0 ≤ r < m, then R(r, m)⊥ = R(m − r − 1, m).
35
1.10 Reed–Muller codes
Proof: Part (i) is certainly true if m = 1 by direct computation and if j = m as R(m, m)
m
is the full space F22 . Assume inductively that R(k, m − 1) ⊆ R(ℓ, m − 1) for all 0 ≤ k ≤
ℓ < m. Let 0 < i ≤ j < m. Then:
R(i, m) = {(u, u + v) | u ∈ R(i, m − 1),v ∈ R(i − 1, m − 1)}
⊆ {(u, u + v) | u ∈ R( j, m − 1),v ∈ R( j − 1, m − 1)}
= R( j, m).
So (i) follows by induction if 0 < i. If i = 0, we only need to show that the all-one vector
of length 2m is in R( j, m) for j < m. Inductively assume the all-one vector of length 2m−1
is in R( j, m − 1). Then by definition (1.7), we see that the all-one vector of length 2m is in
R( j, m) as one choice for u is 1 and one choice for v is 0.
m
For (ii) the result is true for r = m as R(m, m) = F22 and
m
m
m
+
+ ··· +
= 2m .
0
1
m
It is also true for m = 1 by inspection. Now assume that R(i, m − 1) has dimension
m−1
m−1
m−1
+
+ ··· +
for all 0 ≤ i < m.
0
1
i
By the discussion in Section 1.5.5 (and Exercise 33), R(r, m) has dimension the sum of the
dimensions of R(r, m − 1) and R(r − 1, m − 1), that is,
m−1
m−1
m−1
m−1
m−1
m−1
+
+ ··· +
+
+
+ ··· +
.
0
1
r
0
1
r −1
The result follows by the elementary properties of binomial coefficients:
m−1
m
m−1
m−1
m
=
and
+
=
.
0
0
i −1
i
i
Part (iii) is again valid for m = 1 by inspection and for both r = 0 and r = m as R(0, m)
m
is the binary repetition code of length 2m and R(m, m) = F22 . Assume that R(i, m − 1)
has minimum weight 2m−1−i for all 0 ≤ i < m. If 0 < r < m, then by definition (1.7) and
the discussion in Section 1.5.5 (and Exercise 33), R(r, m) has minimum weight min{2 ·
2m−1−r , 2m−1−(r −1) } = 2m−r .
m
To prove (iv), we first note that R(m, m)⊥ is {0} since R(m, m) = F22 . So if we define
R(−1, m) = {0}, then R(−1, m)⊥ = R(m − (−1) − 1, m) for all m > 0. By direct computation, R(r, m)⊥ = R(m − r − 1, m) for all r with −1 ≤ r ≤ m = 1. Assume inductively
that if −1 ≤ i ≤ m − 1, then R(i, m − 1)⊥ = R((m − 1) − i − 1, m − 1). Let 0 ≤ r <
m. To prove R(r, m)⊥ = R(m − r − 1, m), it suffices to show that R(m − r − 1, m) ⊆
R(r, m)⊥ as dim R(r, m) + dim R(m − r − 1, m) = 2m by (ii). Notice that with the definition of R(−1, m), (1.7) extends to the case r = 0. Let x = (a, a + b) ∈ R(m − r − 1, m)
where a ∈ R(m − r − 1, m − 1) and b ∈ R(m − r − 2, m − 1), and let y = (u, u + v) ∈
R(r, m) where u ∈ R(r, m − 1) and v ∈ R(r − 1, m − 1). Then x · y = 2a · u + a · v + b ·
u + b · v = a · v + b · u + b · v. Each term is 0 as follows. As a ∈ R(m − r − 1, m − 1) =
R(r − 1, m − 1)⊥ , a · v = 0. As b ∈ R(m − r − 2, m − 1) = R(r, m − 1)⊥ , b · u = 0 and
36
Basic concepts of linear codes
b · v = 0 using R(r − 1, m − 1) ⊆ R(r, m − 1) from (i). We conclude that R(m − r −
1, m) ⊆ R(r, m)⊥ , completing (iv).
We make a few observations based on this theorem. First, since R(0, m) is the length
2m repetition code, R(m − 1, m) = R(0, m)⊥ is the code of all even weight vectors in
m
F22 . We had previously observed this about R(1, 2) and R(2, 3). Second, if m is odd and
r = (m − 1)/2 we see from parts (iii) and (iv) that R(r, m) = R((m − 1)/2, m) is self-dual
with minimum weight 2(m−1)/2 . Again we had observed this about R(1, 3). In the exercises,
you will also verify the general result that puncturing R(1, m) and then taking the subcode
of even weight vectors produces the simplex code S m of length 2m − 1.
Exercise 62 In this exercise we produce another generator matrix G ′′ (1, m) for R(1, m).
Define
G ′′ (1, 1) =
1
0
1
.
1
For m ≥ 2, recursively define
G ′ (1, m) =
G ′′ (1, m − 1)
00 · · · 0
G ′′ (1, m − 1)
,
11 · · · 1
and define G ′′ (1, m) to be the matrix obtained from G ′ (1, m) by removing the bottom row
and placing it as row two in the matrix, moving the rows below down.
(a) Show that G ′′ (1, 1) is a generator matrix for R(1, 1).
(b) Find the matrices G ′ (1, 2), G ′′ (1, 2), G ′ (1, 3), and G ′′ (1, 3).
(c) What do you notice about the columns of the matrices obtained from G ′′ (1, 2) and
G ′′ (1, 3) by deleting the first row and the first column?
(d) Show using induction, part (a), and the definition (1.7) that G ′′ (1, m) is a generator
matrix for R(1, m).
(e) Formulate a generalization of part (c) that applies to the matrix obtained from G ′′ (1, m)
by deleting the first row and the first column. Prove your generalization is correct.
(f) Show that the code generated by the matrix obtained from G ′′ (1, m) by deleting the
first row and the first column is the simplex code S m .
m .
(g) Show that the code R(m − 2, m) is the extended binary Hamming code H
Notice that this problem shows that the extended binary Hamming codes and their duals
are Reed–Muller codes.
1.11
Encoding, decoding, and Shannon’s Theorem
Since the inception of coding theory, codes have been used in many diverse ways; in
addition to providing reliability in communication channels and computers, they give high
fidelity on compact disc recordings, and they have also permitted successful transmission
of pictures from outer space. New uses constantly appear. As a primary application of
codes is to store or transmit data, we introduce the process of encoding and decoding a
message.
37
1.11 Encoding, decoding, and Shannon’s Theorem
1.11.1 Encoding
Let C be an [n, k] linear code over the field Fq with generator matrix G. This code has q k
codewords which will be in one-to-one correspondence with q k messages. The simplest way
to view these messages is as k-tuples x in Fqk . The most common way to encode the message
x is as the codeword c = xG. If G is in standard form, the first k coordinates of the codeword
c are the information symbols x; the remaining n − k symbols are the parity check symbols,
that is, the redundancy added to x in order to help recover x if errors occur. The generator
matrix G may not be in standard form. If, however, there exist column indices i 1 , i 2 , . . . , i k
such that the k × k matrix consisting of these k columns of G is the k × k identity matrix,
then the message is found in the k coordinates i 1 , i 2 , . . . , i k of the codeword scrambled but
otherwise unchanged; that is, the message symbol x j is in component i j of the codeword.
If this occurs, we say that the encoder is systematic. If G is replaced by another generator
matrix, the encoding of x will, of course, be different. By row reduction, one could always
choose a generator matrix so that the encoder is systematic. Furthermore, if we are willing
to replace the code with a permutation equivalent one, by Theorem 1.6.2, we can choose a
code with generator matrix in standard form, and therefore the first k bits of the codeword
make up the message.
The method just described shows how to encode a message x using the generator matrix of
the code C. There is a second way to encode using the parity check matrix H . This is easiest to
do when G is in standard form [Ik | A]. In this case H = [−AT | In−k ] by Theorem 1.2.1.
Suppose that x = x1 · · · xk is to be encoded as the codeword c = c1 · · · cn . As G is in
standard form, c1 · · · ck = x1 · · · xk . So we need to determine the n − k parity check symbols
(redundancy symbols) ck+1 · · · cn . As 0 = H cT = [−AT | In−k ]cT , AT xT = [ck+1 · · · cn ]T .
One can generalize this when G is a systematic encoder.
Example 1.11.1 Let C be the [6, 3, 3] binary code with generator and parity check matrices
1
G= 0
0
0
1
0
0
0
1
1
1
0
0
1
1
1
0 and
1
1
H= 0
1
1 0 1 0 0
1 1 0 1 0 ,
0 1 0 0 1
respectively. Suppose we desire to encode the message x = x1 x2 x3 to obtain the codeword
c = c1 c2 · · · c6 . Using G to encode yields
c = xG = (x1 , x2 , x3 , x1 + x2 , x2 + x3 , x1 + x3 ).
(1.8)
Using H to encode, 0 = H cT leads to the system
0 = c1 + c 2 + c 4 ,
0 = c2 + c3 + c5 ,
0 = c1 + c3 + c6 .
As G is in standard form, c1 c2 c3 = x1 x2 x3 , and solving this system clearly gives the same
codeword as in (1.8).
38
Basic concepts of linear codes
Exercise 63 Let C be the Hamming code H3 of Example 1.2.3, with parity check matrix
0 1 1 1 1 0 0
H = 1 0 1 1 0 1 0 .
1 1 0 1 0 0 1
(a) Construct the generator matrix for C and use it to encode the message 0110.
(b) Use your generator matrix to encode x1 x2 x3 x4 .
(c) Use H to encode the messages 0110 and x1 x2 x3 x4 .
Since there is a one-to-one correspondence between messages and codewords, one often
works only with the encoded messages (the codewords) at both the sending and receiving
end. In that case, at the decoding end in Figure 1.1, we are satisfied with an estimate
c
obtained by the decoder from y, hoping that this is the codeword c that was transmitted.
However, if we are interested in the actual message, a question arises as to how to recover
the message from a codeword. If the codeword c = xG, and G is in standard form, the
message is the first k components of c; if the encoding is systematic, it is easy to recover
the message by looking at the coordinates of G containing the identity matrix. What can
be done otherwise? Because G has independent rows, there is an n × k matrix K such
that G K = Ik ; K is called a right inverse for G and is not necessarily unique. As c = xG,
cK = xG K = x.
Exercise 64 Let G be a k × n generator matrix for a binary code C.
(a) Suppose G = [Ik A]. Show that
Ik
,
O
where O is the (n − k) × k zero matrix, is a right inverse of G.
(b) Find a 7 × 3 right inverse K of G, where
1 0 1 1 0 1 1
G = 1 1 0 1 0 1 0 .
0 0 1 1 1 1 0
Hint: One way K can be found is by using four zero rows and the three rows of I3 .
(c) Find a 7 × 4 right inverse K of G, where
1 1 0 1 0 0 0
0 1 1 0 1 0 0
G=
0 0 1 1 0 1 0 .
0 0 0 1 1 0 1
Remark: In Chapter 4, we will see that G generates a cyclic code and the structure of
G is typical of the structure of generator matrices of such codes.
(d) What is the message x if xG = 1000110, where G is given in part (c)?
39
1.11 Encoding, decoding, and Shannon’s Theorem
1 s❍
❍
1−̺
✲
❍
❍❍
❍
s1
✟✟
✟
̺ ✟✟
✯
✟
❍❍
✟
✟
✟ ❍
✟
❥
❍
✟
̺ ❍❍
✟
✟
❍
❍❍s
✟
✲
0 s✟
0
1−̺
Send
Receive
Figure 1.2 Binary symmetric channel.
1.11.2 Decoding and Shannon’s Theorem
The process of decoding, that is, determining which codeword (and thus which message x)
was sent when a vector y is received, is more complex. Finding efficient (fast) decoding
algorithms is a major area of research in coding theory because of their practical applications.
In general, encoding is easy and decoding is hard, if the code has a reasonably large
size.
In order to set the stage for decoding, we begin with one possible mathematical model
of a channel that transmits binary data. This model is called the binary symmetric channel
(or BSC) with crossover probability ̺ and is illustrated in Figure 1.2. If 0 or 1 is sent, the
probability it is received without error is 1 − ̺; if a 0 (respectively 1) is sent, the probability
that a 1 (respectively 0) is received is ̺. In most practical situations ̺ is very small. This
is an example of a discrete memoryless channel (or DMC), a channel in which inputs
and outputs are discrete and the probability of error in one bit is independent of previous
bits. We will assume that it is more likely that a bit is received correctly than in error; so
̺ < 1/2.5
If E 1 and E 2 are events, let prob(E 1 ) denote the probability that E 1 occurs and prob(E 1 |
E 2 ) the probability that E 1 occurs given that E 2 occurs. Assume that c ∈ Fn2 is sent and
y ∈ Fn2 is received and decoded as
c ∈ Fn2 . So prob(c | y) is the probability that the codeword
c is sent given that y is received, and prob(y | c) is the probability that y is received given that
the codeword c is sent. These probabilities can be computed from the statistics associated
with the channel. The probabilities are related by Bayes’ Rule
prob(c | y) =
prob(y | c)prob(c)
,
prob(y)
where prob(c) is the probability that c is sent and prob(y) is the probability that y is received.
There are two natural means by which a decoder can make a choice based on these two
probabilities. First, the decoder could choose
c = c for the codeword c with prob(c | y)
maximum; such a decoder is called a maximum a posteriori probability (or MAP) decoder.
5
While ̺ is usually very small, if ̺ > 1/2, the probability that a bit is received in error is higher than the
probability that it is received correctly. So one strategy is to interchange 0 and 1 immediately at the receiving
end. This converts the BSC with crossover probability ̺ to a BSC with crossover probability 1 − ̺ < 1/2. This
of course does not help if ̺ = 1/2; in this case communication is not possible – see Exercise 77.
40
Basic concepts of linear codes
Symbolically, a MAP decoder makes the decision
c = arg max prob(c | y).
c∈C
Here arg maxc∈C prob(c | y) is the argument c of the probability function prob(c | y) that
maximizes this probability. Alternately, the decoder could choose
c = c for the codeword c
with prob(y | c) maximum; such a decoder is called a maximum likelihood (or ML) decoder.
Symbolically, a ML decoder makes the decision
c = arg max prob(y | c).
c∈C
(1.9)
Consider ML decoding over a BSC. If y = y1 · · · yn and c = c1 · · · cn ,
prob(y | c) =
n
i=1
prob(yi | ci ),
since we assumed that bit errors are independent. By Figure 1.2, prob(yi | ci ) = ̺ if yi = ci
and prob(yi | ci ) = 1 − ̺ if yi = ci . Therefore
d(y,c)
̺
prob(y | c) = ̺d(y,c) (1 − ̺)n−d(y,c) = (1 − ̺)n
.
(1.10)
1−̺
Since 0 < ̺ < 1/2, 0 < ̺/(1 − ̺) < 1. Therefore maximizing prob(y | c) is equivalent
to minimizing d(y, c), that is, finding the codeword c closest to the received vector y in
Hamming distance; this is called nearest neighbor decoding. Hence on a BSC, maximum
likelihood and nearest neighbor decoding are the same.
Let e = y − c so that y = c + e. The effect of noise in the communication channel is to
add an error vector e to the codeword c, and the goal of decoding is to determine e. Nearest
neighbor decoding is equivalent to finding a vector e of smallest weight such that y − e is in
the code. This error vector need not be unique since there may be more than one codeword
closest to y; in other words, (1.9) may not have a unique solution. When we have a decoder
capable of finding all codewords nearest to the received vector y, then we have a complete
decoder.
To examine vectors closest to a given codeword, the concept of spheres about codewords
proves useful. The sphere of radius r centered at a vector u in Fqn is defined to be the set
Sr (u) = v ∈ Fqn d(u, v) ≤ r
of all vectors whose distance from u is less than or equal to r . The number of vectors in
Sr (u) equals
r
n
(1.11)
(q − 1)i .
i
i=0
These spheres are pairwise disjoint provided their radius is chosen small enough.
Theorem 1.11.2 If d is the minimum distance of a code C (linear or nonlinear) and t =
⌊(d − 1)/2⌋, then spheres of radius t about distinct codewords are disjoint.
41
1.11 Encoding, decoding, and Shannon’s Theorem
Proof: If z ∈ St (c1 ) ∩ St (c2 ), where c1 and c2 are codewords, then by the triangle inequality
(Theorem 1.4.1(iv)),
d(c1 , c2 ) ≤ d(c1 , z) + d(z, c2 ) ≤ 2t < d,
implying that c1 = c2 .
Corollary 1.11.3 With the notation of the previous theorem, if a codeword c is sent and y
is received where t or fewer errors have occurred, then c is the unique codeword closest to
y. In particular, nearest neighbor decoding uniquely and correctly decodes any received
vector in which at most t errors have occurred in transmission.
Exercise 65 Prove that the number of vectors in Sr (u) is given by (1.11).
For purposes of decoding as many errors as possible, this corollary implies that for given
n and k, we wish to find a code with as high a minimum weight d as possible. Alternately,
given n and d, one wishes to send as many messages as possible; thus we want to find a
code with the largest number of codewords, or, in the linear case, the highest dimension.
We may relax these requirements somewhat if we can find a code with an efficient decoding
algorithm.
Since the minimum distance of C is d, there exist two distinct codewords such that the
spheres of radius t + 1 about them are not disjoint. Therefore if more than t errors occur,
nearest neighbor decoding may yield more than one nearest codeword. Thus C is a t-errorcorrecting code but not a (t + 1)-error-correcting code. The packing radius of a code is the
largest radius of spheres centered at codewords so that the spheres are pairwise disjoint.
This discussion shows the following two facts about the packing radius.
Theorem 1.11.4 Let C be an [n, k, d] code over Fq . The following hold:
(i) The packing radius of C equals t = ⌊(d − 1)/2⌋.
(ii) The packing radius t of C is characterized by the property that nearest neighbor decoding
always decodes correctly a received vector in which t or fewer errors have occurred but
will not always decode correctly a received vector in which t + 1 errors have occurred.
The decoding problem now becomes one of finding an efficient algorithm that will correct
up to t errors. One of the most obvious decoding algorithms is to examine all codewords
until one is found with distance t or less from the received vector. But obviously this is
a realistic decoding algorithm only for codes with a small number of codewords. Another
obvious algorithm is to make a table consisting of a nearest codeword for each of the q n
vectors in Fqn and then look up a received vector in the table in order to decode it. This is
impractical if q n is very large.
For an [n, k, d] linear code C over Fq , we can, however, devise an algorithm using a
table with q n−k rather than q n entries where one can find the nearest codeword by looking
up one of these q n−k entries. This general decoding algorithm for linear codes is called
syndrome decoding. Because our code C is an elementary abelian subgroup of the additive
group of Fqn , its distinct cosets x + C partition Fqn into q n−k sets of size q k . Two vectors x
and y belong to the same coset if and only if y − x ∈ C. The weight of a coset is the smallest
weight of a vector in the coset, and any vector of this smallest weight in the coset is called
42
Basic concepts of linear codes
a coset leader. The zero vector is the unique coset leader of the code C. More generally,
every coset of weight at most t = ⌊(d − 1)/2⌋ has a unique coset leader.
Exercise 66 Do the following:
(a) Prove that if C is an [n, k, d] code over Fq , every coset of weight at most t = ⌊(d − 1)/2⌋
has a unique coset leader.
(b) Find a nonzero binary code of length 4 and minimum weight d in which all cosets have
unique coset leaders and some coset has weight greater than t = ⌊(d − 1)/2⌋.
Choose a parity check matrix H for C. The syndrome of a vector x in Fqn with respect to
the parity check matrix H is the vector in Fqn−k defined by
syn(x) = H xT .
The code C consists of all vectors whose syndrome equals 0. As H has rank n − k, every
vector in Fqn−k is a syndrome. If x1 , x2 ∈ Fqn are in the same coset of C, then x1 − x2 = c ∈ C.
Therefore syn(x1 ) = H (x2 + c)T = H xT2 + H cT = H xT2 = syn(x2 ). Hence x1 and x2 have
the same syndrome. On the other hand, if syn(x1 ) = syn(x2 ), then H (x2 − x1 )T = 0 and so
x2 − x1 ∈ C. Thus we have the following theorem.
Theorem 1.11.5 Two vectors belong to the same coset if and only if they have the same
syndrome.
Hence there exists a one-to-one correspondence between cosets of C and syndromes. We
denote by C s the coset of C consisting of all vectors in Fqn with syndrome s.
Suppose a codeword sent over a communication channel is received as a vector y. Since
in nearest neighbor decoding we seek a vector e of smallest weight such that y − e ∈ C,
nearest neighbor decoding is equivalent to finding a vector e of smallest weight in the coset
containing y, that is, a coset leader of the coset containing y. The Syndrome Decoding
Algorithm is the following implementation of nearest neighbor decoding. We begin with a
fixed parity check matrix H .
I. For each syndrome s ∈ Fqn−k , choose a coset leader es of the coset C s . Create a table
pairing the syndrome with the coset leader.
This process can be somewhat involved, but this is a one-time preprocessing task that
is carried out before received vectors are analyzed. One method of computing this table
will be described shortly. After producing the table, received vectors can be decoded.
II. After receiving a vector y, compute its syndrome s using the parity check matrix H .
III. y is then decoded as the codeword y − es .
Syndrome decoding requires a table with only q n−k entries, which may be a vast improvement over a table of q n vectors showing which codeword is closest to each of these.
However, there is a cost for shortening the table: before looking in the table of syndromes,
one must perform a matrix-vector multiplication in order to determine the syndrome of the
received vector. Then the table is used to look up the syndrome and find the coset leader.
How do we construct the table of syndromes as described in Step I? We briefly discuss
this for binary codes; one can extend this easily to nonbinary codes. Given the t-errorcorrecting code C of length n with parity check matrix H , we can construct the syndromes
as follows. The coset of weight 0 has coset leader 0. Consider the n cosets of weight 1.
43
1.11 Encoding, decoding, and Shannon’s Theorem
Choose an n-tuple with a 1 in position i and 0s elsewhere; the coset leader is the n-tuple and
the associated syndrome is column i of H . For the ( n2 ) cosets of weight 2, choose an n-tuple
with two 1s in positions i and j, with i < j, and the rest 0s; the coset leader is the n-tuple
and the associated syndrome is the sum of columns i and j of H. Continue in this manner
through the cosets of weight t. We could choose to stop here. If we do, we can decode any
received vector with t or fewer errors, but if the received vector has more than t errors, it
will be either incorrectly decoded (if the syndrome of the received vector is in the table) or
not decoded at all (if the syndrome of the received vector is not in the table). If we decide
to go on and compute syndromes of weights w greater than t, we continue in the same
fashion with the added feature that we must check for possible repetition of syndromes.
This repetition will occur if the n-tuple of weight w is not a coset leader or it is a coset
leader with the same syndrome as another leader of weight w, in which cases we move on
to the next n-tuple. We continue until we have 2n−k syndromes. The table produced will
allow us to perform nearest neighbor decoding.
Syndrome decoding is particularly simple for the binary Hamming codes Hr with parameters [n = 2r − 1, 2r − 1 − r, 3]. We do not have to create the table for syndromes and
corresponding coset leaders. This is because the coset leaders are unique and are the 2r
vectors of weight at most 1. Let Hr be the parity check matrix whose columns are the
binary numerals for the numbers 1, 2, . . . , 2r − 1. Since the syndrome of the binary n-tuple
of weight 1 whose unique 1 is in position i is the r -tuple representing the binary numeral for
i, the syndrome immediately gives the coset leader and no table is required for syndrome
decoding. Thus Syndrome Decoding for Binary Hamming Codes takes the form:
I. After receiving a vector y, compute its syndrome s using the parity check matrix Hr .
II. If s = 0, then y is in the code and y is decoded as y; otherwise, s is the binary numeral
for some positive integer i and y is decoded as the codeword obtained from y by adding
1 to its ith bit.
The above procedure is easily modified for Hamming codes over other fields. This is
explored in the exercises.
Exercise 67 Construct the parity check matrix of the binary Hamming code H4 of length
15 where the columns are the binary numbers 1, 2, . . . , 15 in that order. Using this parity
check matrix decode the following vectors, and then check that your decoded vectors are
actually codewords.
(a) 001000001100100,
(b) 101001110101100,
(c) 000100100011000.
Exercise 68 Construct a table of all syndromes of the ternary tetracode of Example 1.3.3
using the generator matrix of that example to construct the parity check matrix. Find a coset
leader for each of the syndromes. Use your parity check matrix to decode the following
vectors, and then check that your decoded vectors are actually codewords.
(a) (1, 1, 1, 1),
(b) (1, −1, 0, −1),
(c) (0, 1, 0, 1).
44
Basic concepts of linear codes
Exercise 69 Let C be the [6, 3, 3] binary code with generator matrix G and parity check
matrix H given by
0 1 1 1 0 0
1 0 0 0 1 1
G = 0 1 0 1 0 1 and H = 1 0 1 0 1 0 .
1 1 0 0 0 1
0 0 1 1 1 0
(a) Construct a table of coset leaders and associated syndromes for the eight cosets of C.
(b) One of the cosets in part (a) has weight 2. This coset has three coset leaders. Which
coset is it and what are its coset leaders?
(c) Using part (a), decode the following received vectors:
(i) 110110,
(ii) 110111,
(iii) 110001.
(d) For one of the received vectors in part (c) there is ambiguity as to what codeword
it should be decoded to. List the other nearest neighbors possible for this received
vector.
3
Exercise 70 Let H
1 1 1 1
0 0 0 0
3 =
H
0 0 1 1
0 1 0 1
be the extended Hamming code with parity check matrix
1 1 1 1
1 1 1 1
.
0 0 1 1
0
1
0
1
3 , we have
Number the coordinates 0, 1, 2, . . . , 7. Notice that if we delete the top row of H
3 without a table of syndromes and coset
the coordinate numbers in binary. We can decode H
leaders using the following algorithm. If y is received, compute syn(y) using the parity check
3 . If syn(y) = (0, 0, 0, 0)T , then y has no errors. If syn(y) = (1, a, b, c)T , then there
matrix H
is a single error in the coordinate position abc (written in binary). If syn(y) = (0, a, b, c)T
with (a, b, c) = (0, 0, 0), then there are two errors in coordinate position 0 and in the
coordinate position abc (written in binary).
(a) Decode the following vectors using this algorithm:
(i) 10110101,
(ii) 11010010,
(iii) 10011100.
3 . To do
(b) Verify that this procedure provides a nearest neighbor decoding algorithm for H
this, the following must be verified. All weight 0 and weight 1 errors can be corrected,
accounting for nine of the 16 syndromes. All weight 2 errors cannot necessarily be
corrected but all weight 2 errors lead to one of the seven syndromes remaining.
A received vector may contain both errors (where a transmitted symbol is read as a
different symbol) and erasures (where a transmitted symbol is unreadable). These are
fundamentally different in that the locations of errors are unknown, whereas the locations
of erasures are known. Suppose c ∈ C is sent, and the received vector y contains ν errors
and ǫ erasures. One could certainly not guarantee that y can be corrected if ǫ ≥ d because
there may be a codeword other than c closer to y. So assume that ǫ < d. Puncture C in the
45
1.11 Encoding, decoding, and Shannon’s Theorem
ǫ positions where the erasures occurred in y to obtain an [n − ǫ, k ∗ , d ∗ ] code C ∗ . Note that
k ∗ = k by Theorem 1.5.7(ii), and d ∗ ≥ d − ǫ. Puncture c and y similarly to obtain c∗ and y∗ ;
these can be viewed as sent and received vectors using the code C ∗ with y∗ containing ν errors
but no erasures. If 2ν < d − ǫ ≤ d ∗ , c∗ can be recovered from y∗ by Corollary 1.11.3. There
is a unique codeword c ∈ C which when punctured produces c∗ ; otherwise if puncturing
both c and c′ yields c∗ , then wt(c − c′ ) ≤ ǫ < d, a contradiction unless c = c′ . The following
theorem summarizes this discussion and extends Corollary 1.11.3.
Theorem 1.11.6 Let C be an [n, k, d] code. If a codeword c is sent and y is received where
ν errors and ǫ erasures have occurred, then c is the unique codeword in C closest to y
provided 2ν + ǫ < d.
3
Exercise 71 Let H
1 1 1 1
0 0 0 0
3 =
H
0 0 1 1
0 1 0 1
be the extended Hamming code with parity check matrix
1 1 1 1
1 1 1 1
.
0 0 1 1
0 1 0 1
Correct the received vector 101 ⋆ 0111, where ⋆ is an erasure.
In Exercises 70 and 71 we explored the decoding of the [8, 4, 4] extended Hamming code
3 . In Exercise 70, we had the reader verify that there are eight cosets of weight 1 and seven
H
of weight 2. Each of these cosets is a nonlinear code and so it is appropriate to discuss the
weight distribution of these cosets and to tabulate the results. In general, the complete coset
weight distribution of a linear code is the weight distribution of each coset of the code. The
3 . As every [8, 4, 4] code
next example gives the complete coset weight distribution of H
is equivalent to H3 , by Exercise 56, this is the complete coset weight distribution of any
[8, 4, 4] binary code.
Example 1.11.7 The complete coset weight distribution of the [8, 4, 4] extended binary
3 is given in the following table:
Hamming code H
Coset
weight
0
0
1
2
1
0
0
1
Number of vectors
of given weight
2 3 4 5 6
7
8
Number
of cosets
0
1
0
0
0
4
0
1
0
1
0
0
1
8
7
0
7
0
14
0
8
0
7
0
0
0
4
3 . The second line is the weight
Note that the first line is the weight distribution of H
distribution of each coset of weight one. This code has the special property that all cosets of
a given weight have the same weight distribution. This is not the case for codes in general.
In Exercise 73 we ask the reader to verify some of the information in the table. Notice that
this code has the all-one vector 1 and hence the table is symmetric about the middle weight.
Notice also that an even weight coset has only even weight vectors, and an odd weight
coset has only odd weight vectors. These observations hold in general; see Exercise 72.
3 . We see that all the cosets
The information in this table helps explain the decoding of H
46
Basic concepts of linear codes
of weight 2 have four coset leaders. This implies that when we decode a received vector in
which two errors had been made, we actually have four equally likely codewords that could
have been sent.
Exercise 72 Let C be a binary code of length n. Prove the following.
(a) If C is an even code, then an even weight coset of C has only even weight vectors, and
an odd weight coset has only odd weight vectors.
(b) If C contains the all-one vector 1, then in a fixed coset, the number of vectors of weight
i is the same as the number of vectors of weight n − i, for 0 ≤ i ≤ n.
3 given in Example
Exercise 73 Consider the complete coset weight distribution of H
1.11.7. The results of Exercise 72 will be useful.
(a) Prove that the weight distribution of the cosets of weight 1 are as claimed.
(b) (Harder) Prove that the weight distribution of the cosets of weight 2 are as
claimed.
We conclude this section with a discussion of Shannon’s Theorem in the framework of
the decoding we have developed. Assume that the communication channel is a BSC with
crossover probability ̺ on which syndrome decoding is used. The word error rate Perr
for this channel and decoding scheme is the probability that the decoder makes an error,
averaged over all codewords of C; for simplicity we assume that each codeword of C is
equally likely to be sent. A decoder error occurs when
c = arg maxc∈C prob(y | c) is not the
originally transmitted word c when y is received. The syndrome decoder makes a correct
decision if y − c is a chosen coset leader. This probability is
̺wt(y−c) (1 − ̺)n−wt(y−c)
by (1.10). Therefore the probability that the syndrome decoder makes a correct decision is
n
i
n−i
, where αi is the number of cosets weight i. Thus
i=0 αi ̺ (1 − ̺)
n
Perr = 1 −
i=0
αi ̺i (1 − ̺)n−i .
(1.12)
Example 1.11.8 Suppose binary messages of length k are sent unencoded over a BSC with
crossover probability ̺. This in effect is the same as using the [k, k] code Fk2 . This code
has a unique coset, the code itself, and its leader is the zero codeword of weight 0. Hence
(1.12) shows that the probability of decoder error is
Perr = 1 − ̺0 (1 − ̺)k = 1 − (1 − ̺)k .
This is precisely what we expect as the probability of no decoding error is the probability
(1 − ̺)k that the k bits are received without error.
Example 1.11.9 We compare sending 24 = 16 binary messages unencoded to encoding
using the [7, 4] binary Hamming code H3 . Assume communication is over a BSC with
crossover probability ̺. By Example 1.11.8, Perr = 1 − (1 − ̺)4 for the unencoded data.
H3 has one coset of weight 0 and seven cosets of weight 1. Hence Perr = 1 − (1 − ̺)7 −
7̺(1 − ̺)6 by (1.12). For example if ̺ = 0.01, Perr without coding is 0.039 403 99. Using
H3 , it is 0.002 031 04 . . . .
47
1.11 Encoding, decoding, and Shannon’s Theorem
Exercise 74 Assume communication is over a BSC with crossover probability ̺.
3.
(a) Using Example 1.11.7, compute Perr for the extended Hamming code H
3 are equal.
(b) Prove that the values of Perr for both H3 , found in Example 1.11.9, and H
(c) Which code H3 or H3 would be better to use when communicating over a BSC?
Why?
Exercise 75 Assume communication is over a BSC with crossover probability ̺ using the
[23, 12, 7] binary Golay code G 23 .
(a) In Exercises 78 and 80 you will see that for G 23 there are ( 23i ) cosets of weight i for
0 ≤ i ≤ 3 and no others. Compute Perr for this code.
(b) Compare Perr for sending 212 binary messages unencoded to encoding with G 23 when
̺ = 0.01.
Exercise 76 Assume communication is over a BSC with crossover probability ̺ using the
[24, 12, 8] extended binary Golay code G 24 .
(a) In Example 8.3.2 you will see that for G 24 there are 1, 24, 276, 2024, and 1771 cosets
of weights 0, 1, 2, 3, and 4, respectively. Compute Perr for this code.
(b) Prove that the values of Perr for both G 23 , found in Exercise 75, and G 24 are equal.
(c) Which code G 23 or G 24 would be better to use when communicating over a BSC?
Why?
For a BSC with crossover probability ̺, the capacity of the channel is
C(̺) = 1 + ̺ log2 ̺ + (1 − ̺) log2 (1 − ̺).
The capacity C(̺) = 1 − H2 (̺), where H2 (̺) is the Hilbert entropy function that we define
in Section 2.10.3. For binary symmetric channels, Shannon’s Theorem is as follows.6
Theorem 1.11.10 (Shannon) Let δ > 0 and R < C(̺). Then for large enough n, there
exists an [n, k] binary linear code C with k/n ≥ R such that Perr < δ when C is used for
communication over a BSC with crossover probability ̺. Furthermore no such code exists
if R > C(̺).
Shannon’s Theorem remains valid for nonbinary codes and other channels provided the
channel capacity is defined appropriately. The fraction k/n is called the rate, or information
rate, of an [n, k] code and gives a measure of how much information is being transmitted;
we discuss this more extensively in Section 2.10.
Exercise 77 Do the following.
(a) Graph the channel capacity as a function of ̺ for 0 < ̺ < 1.
(b) In your graph, what is the region in which arbitrarily reliable communication can occur
according to Shannon’s Theorem?
(c) What is the channel capacity when ̺ = 1/2? What does Shannon’s Theorem say about
communication when ̺ = 1/2? (See Footnote 5 earlier in this section.)
6
Shannon’s original theorem was stated for nonlinear codes but was later shown to be valid for linear codes as
well.
48
Basic concepts of linear codes
1.12
Sphere Packing Bound, covering radius, and perfect codes
The minimum distance d is a simple measure of the goodness of a code. For a given length
and number of codewords, a fundamental problem in coding theory is to produce a code
with the largest possible d. Alternatively, given n and d, determine the maximum number
Aq (n, d) of codewords in a code over Fq of length n and minimum distance at least d. The
number A2 (n, d) is also denoted by A(n, d). The same question can be asked for linear codes.
Namely, what is the maximum number Bq (n, d) (B(n, d) in the binary case) of codewords
in a linear code over Fq of length n and minimum weight at least d? Clearly, Bq (n, d) ≤
Aq (n, d). For modest values of n and d, A(n, d) and B(n, d) have been determined and
tabulated; see Chapter 2.
The fact that the spheres of radius t about codewords are pairwise disjoint immediately
implies the following elementary inequality, commonly referred to as the Sphere Packing
Bound or the Hamming Bound.
Theorem 1.12.1 (Sphere Packing Bound)
Bq (n, d) ≤ Aq (n, d) ≤
qn
,
t
n
i
(q − 1)
i
i=0
where t = ⌊(d − 1)/2⌋.
Proof: Let C be a (possibly nonlinear) code over Fq of length n and minimum distance d.
Suppose that C contains M codewords. By Theorem 1.11.2,
the spheres of radius t about
t n
(q
−
1)i total vectors in any one
distinct codewords are disjoint. As there are α = i=0
i
of these spheres by (1.11) and the spheres are disjoint, Mα cannot exceed the number q n
of vectors in Fqn . The result is now clear.
From the proof of the Sphere Packing Bound, we see that when we get equality in the
bound, we actually fill the space Fqn with disjoint spheres of radius t. In other words, every
vector in Fqn is contained in precisely one sphere of radius t centered about a codeword.
When we have a code for which this is true, the code is called perfect.
Example 1.12.2 Recall that the Hamming code Hq,r over Fq is an [n, k, 3] code, where
n = (q r − 1)/(q − 1) and k = n − r . Then t = 1 and
qn
qn
qn
=
= qk.
=
t
r
1
+
n(q
−
1)
q
n
(q − 1)i
i
i=0
Thus Hq,r is perfect.
Exercise 78 Prove that the [23, 12, 7] binary and the [11, 6, 5] ternary Golay codes are
perfect.
49
1.12 Sphere Packing Bound, covering radius, and perfect codes
Exercise 79 Show that the following codes are perfect:
(a) the codes C = Fqn ,
(b) the codes consisting of exactly one codeword (the zero vector in the case of linear
codes),
(c) the binary repetition codes of odd length, and
(d) the binary codes of odd length consisting of a vector c and the complementary vector
c with 0s and 1s interchanged.
These codes are called trivial perfect codes.
Exercise
80 Prove that a perfect t-error-correcting linear code of length n has precisely
n
cosets of weight i for 0 ≤ i ≤ t and no other cosets. Hint: How many weight i vectors
i
in Fqn are there? Could distinct vectors of weights i and j with i ≤ t and j ≤ t be in the
same coset? Use the equality in the Sphere Packing Bound.
So the Hamming codes are perfect, as are two of the Golay codes, as shown in Exercise 78.
Furthermore, Theorem 1.8.2 shows that all linear codes of the same length, dimension, and
minimum weight as a Hamming code are equivalent. Any of these codes can be called the
Hamming code. There are also some trivial perfect codes as described in Exercise 79. Thus
we have part of the proof of the following theorem.
Theorem 1.12.3
(i) There exist perfect single error-correcting codes over Fq which are not linear and all
such codes have parameters corresponding to those of the Hamming codes, namely,
length n = (q r − 1)/(q − 1) with q n−r codewords and minimum distance 3. The only
perfect single error-correcting linear codes over Fq are the Hamming codes.
(ii) The only nontrivial perfect multiple error-correcting codes have the same length,
number of codewords, and minimum distance as either the [23, 12, 7] binary Golay
code or the [11, 6, 5] ternary Golay code.
(iii) Any binary (respectively, ternary) possibly nonlinear code with 212 (respectively, 36 )
vectors containing the 0 vector with length 23 (respectively, 11) and minimum distance 7 (respectively, 5) is equivalent to the [23, 12, 7] binary (respectively, [11, 6, 5]
ternary) Golay code.
The classification of the perfect codes as summarized in this theorem was a significant
and difficult piece of mathematics, in which a number of authors contributed. We will prove
part (iii) in Chapter 10. The rest of the proof can be found in [137, Section 5]. A portion of
part (ii) is proved in Exercise 81.
Exercise 81 The purpose of this exercise is to prove part of Theorem 1.12.3(ii). Let C be
an [n, k, 7] perfect binary code.
(a) Using equality in the Sphere Packing Bound, prove that
(n + 1)[(n + 1)2 − 3(n + 1) + 8] = 3 · 2n−k+1 .
(b) Prove that n + 1 is either 2b or 3 · 2b where, in either case, b ≤ n − k + 1.
(c) Prove that b < 4.
50
Basic concepts of linear codes
(d) Prove that n = 23 or n = 7.
(e) Name two codes that are perfect [n, k, 7] codes, one with n = 7 and the other with
n = 23.
One can obtain nonlinear perfect codes by taking a coset of a linear perfect code; see
Exercise 82. Theorem 1.12.3 shows that all multiple error-correcting nonlinear codes are
cosets of the binary Golay code of length 23 or the ternary Golay code of length 11. On
the other hand, there are nonlinear single error-correcting codes which are not cosets of
Hamming codes; these were first constructed by Vasil’ev [338].
Exercise 82 Prove that a coset of a linear perfect code is also a perfect code.
Let C be an [n, k, d] code over Fq and let t = ⌊(d − 1)/2⌋. When you do not have a
perfect code, in order to fill the space Fqn with spheres centered at codewords, the spheres
must have radius larger than t. Of course when you increase the sphere size, not all spheres
will be pairwise disjoint. We define the covering radius ρ = ρ(C) to be the smallest integer
s such that Fqn is the union of the spheres of radius s centered at the codewords of C.
Equivalently,
ρ(C) = maxn min d(x, c).
x∈Fq c∈C
Obviously, t ≤ ρ(C) and t = ρ(C) if and only if C is perfect. By Theorem 1.11.4, the packing
radius of a code is the largest radius of spheres centered at codewords so that the spheres
are disjoint. So a code is perfect if and only if its covering radius equals its packing radius.
If the code is not perfect, its covering radius is larger than its packing radius.
For a nonlinear code C, the covering radius ρ(C) is defined in the same way to be
ρ(C) = maxn min d(x, c).
x∈Fq c∈C
Again if d is the minimum distance of C and t = ⌊(d − 1)/2⌋, then t ≤ ρ(C) and t = ρ(C)
if and only if C is perfect. The theorems that we prove later in this section are only for linear
codes.
If C is a code with packing radius t and covering radius t + 1, C is called quasi-perfect.
There are many known linear and nonlinear quasi-perfect codes (e.g. certain double errorcorrecting BCH codes and some punctured Preparata codes). However, unlike perfect codes,
there is no general classification.
Example 1.12.4 By Exercise 56, the binary [8, 4, 4] code is shown to be unique, in the
3 . In Example 1.11.7, we give the complete
sense that all such codes are equivalent to H
coset weight distribution of this code. Since there are no cosets of weight greater than 2, the
3 ), is 2. Since the packing radius is t = ⌊(4 − 1)/2⌋ = 1, this code is
covering radius, ρ(H
quasi-perfect. Both the covering and packing radius of the nonextended Hamming code H3
equal 1. This is an illustration of the fact that extending a binary code will not increase
its packing radius (error-correcting capability) but will increase its covering radius. See
Theorem 1.12.6(iv) below.
51
1.12 Sphere Packing Bound, covering radius, and perfect codes
Recall that the weight of a coset of a code C is the smallest weight of a vector in the coset.
The definition of the covering radius implies the following characterization of the covering
radius of a linear code in terms of coset weights and in terms of syndromes.
Theorem 1.12.5 Let C be a linear code with parity check matrix H . Then:
(i) ρ(C) is the weight of the coset of largest weight;
(ii) ρ(C) is the smallest number s such that every nonzero syndrome is a combination of s
or fewer columns of H , and some syndrome requires s columns.
Exercise 83 Prove Theorem 1.12.5.
We conclude this chapter by collecting some elementary facts about the covering radius
of codes and coset leaders, particularly involving codes arising in Section 1.5. More on
covering radius can be found in Chapter 11.
Theorem 1.12.6 Let C be an [n, k] code over Fq . Let
C be the extension of C, and let C ∗ be
a code obtained from C by puncturing on some coordinate. The following hold:
(i) If C = C 1 ⊕ C 2 , then ρ(C) = ρ(C 1 ) + ρ(C 2 ).
(ii) ρ(C ∗ ) = ρ(C) or ρ(C ∗ ) = ρ(C) − 1.
(iii) ρ(
C) = ρ(C) or ρ(
C) = ρ(C) + 1.
(iv) If q = 2, then ρ(C) = ρ(C) + 1.
(v) Assume that x is a coset leader of C. If x′ ∈ Fqn all of whose nonzero components agree
with the same components of x, then x′ is also a coset leader of C. In particular, if
there is a coset of weight s, there is also a coset of any weight less than s.
Proof: The proofs of the first three assertions are left as exercises.
For (iv), let x = x1 · · · xn be a coset leader of C. Let x′ = x1 · · · xn′ . By part (iii), it suffices
C. Let c = c1 · · · cn ∈ C, and let
c = c1 · · · cn cn+1 be its
to show that x′ is a coset leader of
′
extension. If c has even weight, then wt(
c + x ) = wt(c + x) + 1 ≥ wt(x) + 1. Assume c
has odd weight. Then wt(
c + x′ ) = wt(c + x). If x has even (odd) weight, then c + x has
odd (even) weight by Theorem 1.4.3, and so wt(c + x) > wt(x) as x is a coset leader. Thus
C.
in all cases, wt(
c + x′ ) ≥ wt(x) + 1 = wt(x′ ) and so x′ is a coset leader of
To prove (v), it suffices, by induction, to verify the result when x = x1 · · · xn is a coset
leader and x′ = x1′ · · · xn′ , where x j = x ′j for all j = i and xi = xi′ = 0. Notice that wt(x) =
wt(x′ ) + 1. Suppose that x′ is not a coset leader. Then there is a codeword c ∈ C such that
x′ + c is a coset leader and hence
wt(x′ + c) ≤ wt(x′ ) − 1 = wt(x) − 2.
(1.13)
But as x and x′ disagree in only one coordinate, wt(x + c) ≤ wt(x′ + c) + 1. Using (1.13),
this implies that wt(x + c) ≤ wt(x) − 1, a contradiction as x is a coset leader.
Exercise 84 Prove parts (i), (ii), and (iii) of Theorem 1.12.6.
The next example illustrates that it is possible to extend or puncture a code and leave the
covering radius unchanged. Compare this to Theorem 1.12.6(ii) and (iii).
52
Basic concepts of linear codes
Example 1.12.7 Let C be the ternary code with generator matrix [1 1 − 1]. Computing
the covering radius, we see that ρ(C) = ρ(
C) = 2. If D =
C and we puncture D on the last
∗
coordinate to obtain D = C, we have ρ(D) = ρ(D∗ ).
In the binary case by Theorem 1.12.6(iv), whenever we extend a code, we increase the
covering radius by 1. But when we puncture a binary code we may not reduce the covering
radius.
Example 1.12.8 Let C be the binary code with generator matrix
1
0
0
1
1
1
1
,
1
and let C ∗ be obtained from C by puncturing on the last coordinate. It is easy to see that
ρ(C) = ρ(C ∗ ) = 1. Also if D is the extension of C ∗ , ρ(D) = 2, consistent with Theorem
1.12.6.
2
Bounds on the size of codes
In this chapter, we present several bounds on the number of codewords in a linear or
nonlinear code given the length n and minimum distance d of the code. In Section 1.12 we
proved the Sphere Packing (or Hamming) Bound, which gives an upper bound on the size
of a code. This chapter is devoted to developing several other upper bounds along with two
lower bounds. There are fewer lower bounds presented, as lower bounds are often tied to
particular constructions of codes. For example, if a code with a given length n and minimum
distance d is produced, its size becomes a lower bound on the code size. In this chapter we
will speak about codes that meet a given bound. If the bound is a lower bound on the size of
a code in terms of its length and minimum distance, then a code C meets the lower bound
if the size of C is at least the size given by the lower bound. If the bound is an upper bound
on the size of a code in terms of its length and minimum distance, then C meets the upper
bound if its size equals the size given by the upper bound.
We present the upper bounds first after we take a closer look at the concepts previously
developed.
2.1
Aq (n, d) and Bq (n, d)
In this section, we will consider both linear and nonlinear codes. An (n, M, d) code C over
Fq is a code of length n with M codewords whose minimum distance is d. The code C can
be either linear or nonlinear; if it is linear, it is an [n, k, d] code, where k = logq M and d
is the minimum weight of C; see Theorem 1.4.2.
We stated the Sphere Packing Bound using the notation Bq (n, d) and Aq (n, d), where
Bq (n, d), respectively Aq (n, d), is the largest number of codewords in a linear, respectively
arbitrary (linear or nonlinear), code over Fq of length n and minimum distance at least d.
A code of length n over Fq and minimum distance at least d will be called optimal if it has
Aq (n, d) codewords (or Bq (n, d) codewords in the case that C is linear). There are other
perspectives on optimizing a code. For example, one could ask to find the largest d, given
n and M, such that there is a code over Fq of length n with M codewords and minimum
distance d. Or, find the smallest n, given M and d such that there is a code over Fq of
length n with M codewords and minimum distance d. We choose to focus on Aq (n, d) and
Bq (n, d) here.
We begin with some rather simple properties of Aq (n, d) and Bq (n, d). First we have the
following obvious facts:
54
Bounds on the size of codes
Table 2.1 Upper and lower bounds on A2 (n, d) for
6 ≤ n ≤ 24
n
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
d=4
4
8
16
20
40
72
144
256
512
1024
2048
2720–3276
5312–6552
10496–13104
20480–26208
36864–43689
73728–87378
147456–173491
294912–344308
d=6
2
2
2
4
6
12
24
32
64
128
256
256–340
512–680
1024–1288
2048–2372
2560–4096
4096–6941
8192–13774
16384–24106
d=8
1
1
2
2
2
2
4
4
8
16
32
36–37
64–72
128–144
256–279
512
1024
2048
4096
d = 10
1
1
1
1
2
2
2
2
2
4
4
6
10
20
40
42–48
50–88
76–150
128–280
Theorem 2.1.1 Bq (n, d) ≤ Aq (n, d) and Bq (n, d) is a nonnegative integer power of q.
So Bq (n, d) is a lower bound for Aq (n, d) and Aq (n, d) is an upper bound for Bq (n, d). The
Sphere Packing Bound is an upper bound on Aq (n, d) and hence on Bq (n, d).
Tables which lead to information about the values of Aq (n, d) or Bq (n, d) have been
computed and are regularly updated. These tables are for small values of q and moderate to
large values of n. The most comprehensive table is compiled by A. E. Brouwer [32], which
gives upper and lower bounds on the minimum distance d of an [n, k] linear code over Fq .
A less extensive table giving bounds for A2 (n, d) is kept by S. Litsyn [205].
To illustrate, we reproduce a table due to many authors and recently updated by Agrell,
Vardy, Zeger, and Litsyn in [2, 205]. Most of the upper bounds in Table 2.1 are obtained
from the bounds presented in this chapter together with the Sphere Packing Bound; ad hoc
methods in certain cases produce the remaining values. Notice that Table 2.1 contains only
even values of d, a consequence of the following result.
Theorem 2.1.2 Let d > 1. Then:
(i) Aq (n, d) ≤ Aq (n − 1, d − 1) and Bq (n, d) ≤ Bq (n − 1, d − 1), and
(ii) if d is even, A2 (n, d) = A2 (n − 1, d − 1) and B2 (n, d) = B2 (n − 1, d − 1).
Furthermore:
(iii) if d is even and M = A2 (n, d), then there exists a binary (n, M, d) code such that all
codewords have even weight and the distance between all pairs of codewords is also
even.
55
2.1 Aq (n, d) and Bq (n, d)
Proof: Let C be a code (linear or nonlinear) with M codewords and minimum distance
d. Puncturing on any coordinate gives a code C ∗ also with M codewords; otherwise
if C ∗ has fewer codewords, there would exist two codewords of C which differ in one
position implying d = 1. This proves (i); to complete (ii), we only need to show that
A2 (n, d) ≥ A2 (n − 1, d − 1) (or B2 (n, d) ≥ B2 (n − 1, d − 1) when C is linear). To that
end let C be a binary code with M codewords, length n − 1, and minimum distance d − 1.
Extend C by adding an overall parity check to obtain a code
C of length n and minimum
distance d, since d − 1 is odd. Because
C has M codewords, A2 (n, d) ≥ A2 (n − 1, d − 1)
(or B2 (n, d) ≥ B2 (n − 1, d − 1)). For (iii), if C is a binary (n, M, d) code with d even, the
punctured code C ∗ as previously stated is an (n − 1, M, d − 1) code. Extending C ∗ produces an (n, M, d) code C∗ since d − 1 is odd; furthermore this code has only even weight
codewords. Since d(x, y) = wt(x + y) = wt(x) + wt(y) − 2wt(x ∩ y), the distance between
codewords is even.
Exercise 85 In the proof of Theorem 2.1.2, we claim that if C is a binary code of length
n − 1 and minimum weight d − 1, where d − 1 is odd, then the extended code
C of length
n has minimum distance d. In Section 1.5.2 we stated that this is true if C is linear, where
it is obvious since the minimum distance is the minimum weight. Prove that it is also true
when C is nonlinear.
Theorem 2.1.2(ii) shows that any table of values of A2 (n, d) or B2 (n, d) only needs to
be compiled for d either always odd or d always even. Despite the fact that A2 (n, d) =
A2 (n − 1, d − 1) when d is even, we want to emphasize that a given bound for A2 (n, d)
may not be the same bound as for A2 (n − 1, d − 1). So since these values are equal, we
can always choose the smaller upper bound, respectively larger lower bound, as a common
upper bound, respectively lower bound, for both A2 (n, d) and A2 (n − 1, d − 1).
Example 2.1.3 By using n = 7 and d = 4 (that is, t = 1) in the Sphere Packing Bound, we
find that A2 (7, 4) ≤ 16. On the other hand, using n = 6 and d = 3 (still, t = 1) the Sphere
Packing Bound yields 64/7 implying that A2 (6, 3) ≤ 9. So by Theorem 2.1.2(ii) an upper
bound for both A2 (7, 4) and A2 (6, 3) is 9.
Exercise 86 Let C be a code (possibly nonlinear) over Fq with minimum distance d. Fix a
codeword c in C. Let C 1 = {x − c | x ∈ C}. Prove that C 1 contains the zero vector 0, has the
same number of codewords as C, and also has minimum distance d. Prove also that C = C 1
if C is linear.
Exercise 87 By Example 2.1.3, A2 (7, 4) ≤ 9.
(a) Prove that B2 (7, 4) ≤ 8.
(b) Find a binary [7, 3, 4] code thus verifying B2 (7, 4) = 23 = 8. In our terminology, this
code is optimal.
(c) Show that A2 (7, 4) is either 8 or 9. (Table 2.1 shows that it is actually 8, a fact we will
verify in the next section.)
Exercise 88 By Table 2.1, A2 (13, 10) = 2.
(a) By computing the Sphere Packing Bound using (n, d) = (13, 10) and (12, 9), find the
best sphere packing upper bound for A2 (13, 10).
56
Bounds on the size of codes
(b) Using part (a), give an upper bound on B2 (13, 10).
(c) Prove that B2 (13, 10) = A2 (13, 10) is exactly 2 by carrying out the following.
(i) Construct a [13, 1, 10] linear code.
(ii) Show that no binary code of length 13 and minimum distance 10 or more can
contain three codewords. (Hint: By Exercise 86, you may assume that such a code
contains the vector 0.)
Exercise 89 This exercise verifies the entry for n = 16 and d = 4 in Table 2.1.
(a) Use the Sphere Packing Bound to get an upper bound on A2 (15, 3). What (linear) code
meets this bound? Is this code perfect?
(b) Use two pairs of numbers (n, d) and the Sphere Packing Bound to get an upper bound
on A2 (16, 4). What (linear) code meets this bound? Is this code perfect?
(c) Justify the value of A2 (16, 4) given in Table 2.1.
In examining Table 2.1, notice that d = 2 is not considered. The values for A2 (n, 2) and
B2 (n, 2) can be determined for all n.
Theorem 2.1.4 A2 (n, 2) = B2 (n, 2) = 2n−1 .
Proof: By Theorem 2.1.2(ii), A2 (n, 2) = A2 (n − 1, 1). But clearly A2 (n − 1, 1) ≤ 2n−1 ,
and the entire space F2n−1 is a code of length n − 1 and minimum distance 1, implying A2 (n −
1, 1) = 2n−1 . By Theorem 2.1.1 as F2n−1 is linear, 2n−1 = B2 (n − 1, 1) = B2 (n, 2).
There is another set of table values that can easily be found as a result of the next
theorem.
Theorem 2.1.5 Aq (n, n) = Bq (n, n) = q.
Proof: The linear code of size q consisting of all multiples of the all-one vector of length
n (that is, the repetition code over Fq ) has minimum distance n. So by Theorem 2.1.1,
Aq (n, n) ≥ Bq (n, n) ≥ q. If Aq (n, n) > q, there exists a code with more than q codewords
and minimum distance n. Hence at least two of the codewords agree on some coordinate;
but then these two codewords are less than distance n apart, a contradiction. So Aq (n, n) =
Bq (n, n) = q.
In tables such as Table 2.1, it is often the case that when one bound is found for a particular
n and d, this bound can be used to find bounds for “nearby” n and d. For instance, once you
have an upper bound for Aq (n − 1, d) or Bq (n − 1, d), there is an upper bound on Aq (n, d)
or Bq (n, d).
Theorem 2.1.6 Aq (n, d) ≤ q Aq (n − 1, d) and Bq (n, d) ≤ q Bq (n − 1, d).
Proof: Let C be a (possibly nonlinear) code over Fq of length n and minimum distance
at least d with M = Aq (n, d) codewords. Let C(α) be the subcode of C in which every
codeword has α in coordinate n. Then, for some α, C(α) contains at least M/q codewords.
Puncturing this code on coordinate n produces a code of length n − 1 and minimum distance
d. Therefore M/q ≤ Aq (n − 1, d) giving Aq (n, d) ≤ q Aq (n − 1, d). We leave the second
inequality as an exercise.
57
2.1 Aq (n, d) and Bq (n, d)
Exercise 90 Let C be an [n, k, d] linear code over Fq .
(a) Prove that if i is a fixed coordinate, either all codewords of C have 0 in that coordinate
position or the subset consisting of all codewords which have a 0 in coordinate position
i is an [n, k − 1, d] linear subcode of C.
(b) Prove that Bq (n, d) ≤ q Bq (n − 1, d).
Exercise 91 Verify the following values for A2 (n, d).
(a) Show that A2 (8, 6) = 2 by direct computation. (That is, show that there is a binary code
with two codewords of length 8 that are distance 6 apart; then show that no code with
three such codewords can exist. Use Exercise 86.)
(b) Show that A2 (9, 6) ≤ 4 using part (a) and Theorem 2.1.6. Construct a code meeting
this bound.
(c) What are B2 (8, 6) and B2 (9, 6)? Why?
Exercise 92 Assume that A2 (13, 6) = 32, as indicated in Table 2.1.
(a) Show that A2 (14, 6) ≤ 64, A2 (15, 6) ≤ 128, and A2 (16, 6) ≤ 256.
(b) Show that if you can verify that A2 (16, 6) = 256, then there is equality in the other
bounds in (a).
See also Exercise 108.
Exercise 93 Show that B2 (13, 6) ≤ 32 by assuming that a [13, 6, 6] binary code exists.
Obtain a contradiction by attempting to construct a generator matrix for this code in standard
form.
Exercise 94 Verify that A2 (24, 8) = 4096 consistent with Table 2.1.
Before proceeding to the other bounds, we observe that the covering radius of a code C
with Aq (n, d) codewords is at most d − 1. For if a code C with Aq (n, d) codewords has
covering radius d or higher, there is a vector x in Fqn at distance d or more from every
codeword of C; hence C ∪ {x} has one more codeword and minimum distance at least d.
The same observation holds for linear codes with Bq (n, d) codewords; such codes have
covering radius d − 1 or less, a fact left to the exercises. For future reference, we state these
in the following theorem.
Theorem 2.1.7 Let C be either a code over Fq with Aq (n, d) codewords or a linear code
over Fq with Bq (n, d) codewords. Then C has covering radius d − 1 or less.
Exercise 95 Prove that if C is a linear code over Fq with Bq (n, d) codewords, then C has
covering radius at most d − 1.
There are two types of bounds that we consider in this chapter. Until Section 2.10 the
bounds we consider in the chapter are valid for arbitrary values of n and d. Most of these
have asymptotic versions which hold for families of codes having lengths that go to infinity.
These asymptotic bounds are considered in Section 2.10.
There is a common technique used in many of the proofs of the upper bounds that we
examine. A code will be chosen and its codewords will become the rows of a matrix. There
will be some expression, related to the bound we are seeking, which must itself be bounded.
We will often look at the number of times a particular entry occurs in a particular column
58
Bounds on the size of codes
of the matrix of codewords. From there we will be able to bound the expression and that
will lead directly to our desired upper bound.
2.2
The Plotkin Upper Bound
The purpose of having several upper bounds is that one may be smaller than another for a
given value of n and d. In general, one would like an upper bound as tight (small) as possible
so that there is hope that codes meeting this bound actually exist. The Plotkin Bound [285]
is an upper bound which often improves the Sphere Packing Bound on Aq (n, d); however,
it is only valid when d is sufficiently close to n.
Theorem 2.2.1 (Plotkin Bound) Let C be an (n, M, d) code over Fq such that r n < d
where r = 1 − q −1 . Then
d
M≤
.
d − rn
In particular,
Aq (n, d) ≤
d
,
d − rn
(2.1)
provided r n < d. In the binary case,
d
A2 (n, d) ≤ 2
2d − n
(2.2)
if n < 2d.
Proof: Let
S=
d(x, y).
x∈C y∈C
If x = y for x, y ∈ C, then d ≤ d(x, y) implying that
M(M − 1)d ≤ S.
(2.3)
Let M be the M × n matrix whose rows are the codewords of C. For 1 ≤ i ≤ n, let n i,α be
the number of times α ∈ Fq occurs in column i of M. As α∈Fq n i,α = M for 1 ≤ i ≤ n,
we have
n
S=
i=1 α∈Fq
n
n i,α (M − n i,α ) = n M 2 −
By the Cauchy–Schwartz inequality,
2
α∈Fq
n i,α ≤ q
2
n i,α
.
α∈Fq
2
n i,α
.
i=1 α∈Fq
(2.4)
59
2.2 The Plotkin Upper Bound
Using this, we obtain
n
S ≤ nM2 −
i=1
q −1
α∈Fq
2
n i,α = nr M 2 .
(2.5)
Combining (2.3) and (2.5) we obtain M ≤ ⌊d/(d − r n)⌋ since M is an integer, which gives
bound (2.1).
In the binary case, this can be slightly improved. We still have
M≤
2d
d
=
,
d − n/2
2d − n
using (2.3) and (2.5). If M is even, we can round the expression 2d/(2d − n) down to the
nearest even integer, which gives (2.2). When M is odd, we do not use Cauchy–Schwartz.
Instead, from (2.4), we observe that
n
n
S=
i=1
[n i,0 (M − n i,0 ) + n i,1 (M − n i,1 )] =
2n i,0 n i,1
(2.6)
i=1
because n i,0 + n i,1 = M. But the right-hand side of (2.6) is maximized when {n i,0 , n i,1 } =
{(M − 1)/2, (M + 1)/2}; thus using (2.3)
1
M(M − 1)d ≤ n (M − 1)(M + 1).
2
Simplifying,
M≤
2d
n
=
− 1,
2d − n
2d − n
which proves (2.2) in the case that M is odd.
The Plotkin Bound has rather limited scope as it is only valid when n < 2d in the binary
case. However, we can examine what happens for “nearby” values, namely n = 2d and
n = 2d + 1.
Corollary 2.2.2 The following bounds hold:
(i) If d is even, A2 (2d, d) ≤ 4d.
(ii) If d is odd, A2 (2d, d) ≤ 2d + 2.
(iii) If d is odd, A2 (2d + 1, d) ≤ 4d + 4.
Proof: By Theorem 2.1.6, A2 (2d, d) ≤ 2A2 (2d − 1, d). But by the Plotkin Bound,
A2 (2d − 1, d) ≤ 2d, giving (i), regardless of the parity of d. If d is odd, we obtain a
better bound. By Theorem 2.1.2(ii), A2 (2d, d) = A2 (2d + 1, d + 1) if d is odd. Applying the Plotkin Bound, A2 (2d + 1, d + 1) ≤ 2d + 2, producing bound (ii). Finally if d is
odd, A2 (2d + 1, d) = A2 (2d + 2, d + 1) by Theorem 2.1.2(ii). Since A2 (2d + 2, d + 1) ≤
4(d + 1) by (i), we have (iii).
Example 2.2.3 The Sphere Packing Bound for A2 (17, 9) is 65 536/1607, yielding
A2 (18, 10) = A2 (17, 9) ≤ 40. However, by the Plotkin Bound, A2 (18, 10) ≤ 10. There is
a code meeting this bound as indicated by Table 2.1.
60
Bounds on the size of codes
Example 2.2.4 The Sphere Packing Bound for A2 (14, 7) is 8192/235, yielding A2 (15, 8) =
A2 (14, 7) ≤ 34. However, by Corollary 2.2.2(ii), A2 (14, 7) ≤ 16. Again this bound is attained as indicated by Table 2.1.
Exercise 96 (a) Find the best Sphere Packing Bound for A2 (n, d) by choosing the smaller
of the Sphere Packing Bounds for A2 (n, d) and A2 (n − 1, d − 1) for the following values
of (n, d). (Note that d is even in each case and so A2 (n, d) = A2 (n − 1, d − 1).) (b) From
the Plotkin Bound and Corollary 2.2.2, where applicable, compute the best bound. (c) For
each (n, d), which bound (a) or (b) is the better bound? (d) What is the true value of A2 (n, d)
according to Table 2.1?
(i) (n, d) = (7, 4) (compare to Example 2.1.3), (8, 4).
(ii) (n, d) = (9, 6) (compare to Exercise 91), (10, 6), (11, 6).
(iii) (n, d) = (14, 8), (15, 8), (16, 8).
(iv) (n, d) = (16, 10), (17, 10), (18, 10), (19, 10). (20, 10).
2.3
The Johnson Upper Bounds
In this section we present a series of bounds due to Johnson [159]. In connection with these
bounds, we introduce the concept of constant weight codes. Bounds on these constant weight
codes will be used in Section 2.3.3 to produce upper bounds on Aq (n, d). A (nonlinear)
(n, M, d) code C over Fq is a constant weight code provided every codeword has the same
weight w. For example, the codewords of fixed weight in a linear code form a constant
weight code. If x and y are distinct codewords of weight w, then d(x, y) ≤ 2w. Therefore
we have the following simple observation.
Theorem 2.3.1 If C is a constant weight (n, M, d) code with codewords of weight w and
if M > 1, then d ≤ 2w.
Define Aq (n, d, w) to be the maximum number of codewords in a constant weight (n, M)
code over Fq of length n and minimum distance at least d whose codewords have weight
w. Obviously Aq (n, d, w) ≤ Aq (n, d).
Example 2.3.2 It turns out that there are 759 weight 8 codewords in the [24, 12, 8] extended binary Golay code. These codewords form a (24, 759, 8) constant weight code with
codewords of weight 8; thus 759 ≤ A2 (24, 8, 8).
We have the following bounds on Aq (n, d, w).
Theorem 2.3.3
(i) Aq (n, d, w) = 1 if d > 2w.
(ii) Aq (n, 2w, w) ≤ ⌊(n(q − 1)/w)⌋.
(iii) A2 (n, 2w, w) = ⌊n/w⌋.
(iv) A2 (n, 2e − 1, w) = A2 (n, 2e, w).
61
2.3 The Johnson Upper Bounds
Proof: Part (i) is a restatement of Theorem 2.3.1. In an (n, M, 2w) constant weight code
C over Fq with codewords of weight w, no two codewords can have the same nonzero
entries in the same coordinate. Thus if M is the M × n matrix whose rows are the codewords of C, each column of M can have at most q − 1 nonzero entries. So M has at
most n(q − 1) nonzero entries. However, each row of M has w nonzero entries and so
Mw ≤ n(q − 1). This gives (ii). For (iii), let C = {c1 , . . . , c M }, where M = ⌊n/w⌋ and
ci is the vector of length n consisting of (i − 1)w 0s followed by w 1s followed by
n − iw 0s, noting that n − Mw ≥ 0. Clearly C is a constant weight binary (n, ⌊n/w⌋, 2w)
code. The existence of this code and part (ii) with q = 2 give (iii). Part (iv) is left for
Exercise 97.
Exercise 97 Show that two binary codewords of the same weight must have even distance
between them. Then use this to show that A2 (n, 2e − 1, w) = A2 (n, 2e, w).
2.3.1
The Restricted Johnson Bound
We consider two bounds on Aq (n, d, w), the first of which we call the Restricted Johnson
Bound.
Theorem 2.3.4 (Restricted Johnson Bound for Aq (n, d, w))
nd(q − 1)
Aq (n, d, w) ≤
qw2 − 2(q − 1)nw + nd(q − 1)
provided qw2 − 2(q − 1)nw + nd(q − 1) > 0, and
nd
A2 (n, d, w) ≤
2w 2 − 2nw + nd
provided 2w 2 − 2nw + nd > 0.
Proof: The second bound is a special case of the first. The proof of the first uses the same
ideas as in the proof of the Plotkin Bound. Let C be an (n, M, d) constant weight code with
codewords of weight w. Let M be the M × n matrix whose rows are the codewords of C.
Let
S=
d(x, y).
x∈C y∈C
If x = y for x, y ∈ C, then d ≤ d(x, y) implying that
M(M − 1)d ≤ S.
(2.7)
For 1 ≤ i ≤ n, let n i,α be the number of times α ∈ Fq occurs in column i of M. So
n
n
S=
i=1 α∈Fq
n i,α (M − n i,α ) =
i=1
2
+
Mn i,0 − n i,0
n
i=1 α∈Fq∗
2
,
Mn i,α − n i,α
(2.8)
62
Bounds on the size of codes
where Fq∗ denotes the nonzero elements of Fq . We analyze each of the last two terms
separately. First,
n
i=1
n i,0 = (n − w)M
because the left-hand side counts the number of 0s in the matrix M and each of the M rows
of M has n − w 0s. Second, by the Cauchy–Schwartz inequality,
n
n i,0
i=1
2
n
≤n
2
n i,0
.
i=1
Combining these we see that the first summation on the right-hand side of (2.8) satisfies
n
i=1
Mn i,0 −
2
n i,0
1
≤ (n − w)M −
n
2
= (n − w)M 2 −
n
n i,0
i=1
2
(n − w)2 M 2
.
n
(2.9)
A similar argument is used on the second summation of the right-hand side of (2.8). This
time
n
i=1 α∈Fq∗
n i,α = wM
because the left-hand side counts the number of nonzero elements in the matrix M and
each of the M rows of M has w nonzero components. By the Cauchy–Schwartz inequality,
2
n
i=1 α∈Fq∗
n
n i,α ≤ n(q − 1)
2
.
n i,α
i=1 α∈Fq∗
This yields
n
∗
q
i=1 α∈F
2
≤ wM 2 −
Mn i,α − n i,α
= wM 2 −
n
1
n(q − 1) i=1
∗
α∈Fq
1
(wM)2 .
n(q − 1)
2
n i,α
Combining (2.7), (2.8), (2.9), and (2.10), we obtain:
M(M − 1)d ≤ (n − w)M 2 −
1
(n − w)2 M 2
+ wM 2 −
(wM)2 ,
n
n(q − 1)
which simplifies to
(M − 1)d ≤ M
2(q − 1)nw − qw2
.
n(q − 1)
(2.10)
63
2.3 The Johnson Upper Bounds
Solving this inequality for M, we get
M≤
qw 2
nd(q − 1)
,
− 2(q − 1)nw + nd(q − 1)
provided the denominator is positive. This produces our bound.
Example 2.3.5 By the Restricted Johnson Bound A2 (7, 4, 4) ≤ 7. The subcode C of H3
consisting of the even weight codewords is a [7, 3, 4] code with exactly seven codewords
of weight 4. Therefore these seven vectors form a (7, 7, 4) constant weight code with
codewords of weight 4. Thus A2 (7, 4, 4) = 7.
Exercise 98 Verify all claims in Example 2.3.5.
Exercise 99 Using the Restricted Johnson Bound, show that A2 (10, 6, 4) ≤ 5. Also construct a (10, 5, 6) constant weight binary code with codewords of weight 4.
2.3.2
The Unrestricted Johnson Bound
The bound in the previous subsection is “restricted” in the sense that qw2 − 2(q − 1)nw +
nd(q − 1) > 0 is necessary. There is another bound on Aq (n, d, w), also due to Johnson,
which has no such restriction.
Theorem 2.3.6 (Unrestricted Johnson Bound for Aq (n, d, w))
(i) If 2w < d, then Aq (n, d, w) = 1.
(ii) If 2w ≥ d and d ∈ {2e − 1, 2e}, then, setting q ∗ = q − 1,
∗
nq
(n − 1)q ∗
(n − w + e)q ∗
Aq (n, d, w) ≤
···
···
.
w
w−1
e
(iii) If w < e, then A2 (n, 2e − 1, w) = A2 (n, 2e, w) = 1.
(iv) If w ≥ e, then
n n−1
n−w+e
···
···
.
A2 (n, 2e − 1, w) = A2 (n, 2e, w) ≤
w w−1
e
Proof: Part (i) is clear from Theorem 2.3.1. For part (ii), let C be an (n, M, d) constant
weight code over Fq with codewords of weight w where M = Aq (n, d, w). Let M be the
M × n matrix of the codewords of C. Let Fq∗ be the nonzero elements of Fq . For 1 ≤ i ≤ n
and α ∈ Fq∗ , let C i (α) be the codewords in C which have α in column i. Suppose that C i (α)
n
m i,α counts the number of times α occurs in the
has m i,α codewords. The expression i=1
n
matrix M. Therefore α∈Fq∗ i=1 m i,α counts the number of nonzero entries in M. Since
C is a constant weight code,
n
∗
α∈Fq i=1
m i,α = wM.
64
Bounds on the size of codes
But if you puncture C i (α) on coordinate i, you obtain an (n − 1, m i,α , d) code with codewords of weight w − 1. Thus m i,α ≤ Aq (n − 1, d, w − 1), yielding
n
wM =
∗
q
α∈F i=1
m i,α ≤ q ∗ n Aq (n − 1, d, w − 1).
Therefore,
Aq (n, d, w) ≤
nq ∗
Aq (n − 1, d, w − 1) .
w
(2.11)
By induction, repeatedly using (2.11),
∗
nq
(n − 1)q ∗
(n − i + 1)q ∗
Aq (n − i, d, w − i) · · ·
Aq (n, d, w) ≤
···
w
w−1
w−i +1
for any i. If d = 2e − 1, let i = w − e + 1; then Aq (n − i, d, w − i) = Aq (n − w + e − 1,
2e − 1, e − 1) = 1 by Theorem 2.3.3(i) and part (ii) holds in this case. If d = 2e, let
i = w − e; then Aq (n − i, d, w − i) = Aq (n − w + e, 2e, e) ≤ ⌊((n − w + e)q ∗ )/e⌋ by
Theorem 2.3.3(ii) and part (ii) again holds in this case.
Parts (iii) and (iv) follow from (i) and (ii) with d = 2e − 1 using Theorem 2.3.3(iv).
Example 2.3.7 In Example 2.3.2 we showed that A2 (24, 8, 8) ≥ 759. The Restricted Johnson Bound cannot be used to obtain an upper bound on A2 (24, 8, 8), but the Unrestricted
Johnson Bound can. By this bound,
24 23 22 21 20
A2 (24, 8, 8) ≤
= 759.
8
7
6
5
4
Thus A2 (24, 8, 8) = 759.
Exercise 100 Do the following:
(a) Prove that A2 (n, d, w) = A2 (n, d, n − w). Hint: If C is a binary constant weight code
of length n with all codewords of weight w, what is the code 1 + C?
(b) Prove that
n
A2 (n − 1, d, w) .
A2 (n, d, w) ≤
n−w
Hint: Use (a) and (2.11).
(c) Show directly that A2 (7, 4, 6) = 1.
(d) Show using parts (b) and (c) that A2 (8, 4, 6) ≤ 4. Construct a binary constant weight
code of length 8 with four codewords of weight 6 and all with distance at least 4 apart,
thus showing that A2 (8, 4, 6) = 4.
(e) Use parts (b) and (d) to prove that A2 (9, 4, 6) ≤ 12.
(f) What are the bounds on A2 (9, 4, 6) and A2 (9, 4, 3) using the Unrestricted Johnson
Bound? Note that A2 (9, 4, 6) = A2 (9, 4, 3) by part (a).
(g) Show that A2 (9, 4, 6) = 12. Hint: By part (a), A2 (9, 4, 6) = A2 (9, 4, 3). A binary
(9, 12, 4) code with all codewords of weight 3 exists where, for each coordinate, there
are exactly four codewords with a 1 in that coordinate.
65
2.3 The Johnson Upper Bounds
Exercise 101 Do the following:
(a) Use the techniques given in Exercise 100 to prove that A2 (8, 4, 5) ≤ 8.
(b) Show that A2 (8, 4, 5) = 8. Hint: By Exercise 100, A2 (8, 4, 5) = A2 (8, 4, 3). A binary
(8, 8, 4) code with all codewords of weight 3 exists where, for each coordinate, there
are exactly three codewords with a 1 in that coordinate.
2.3.3
The Johnson Bound for Aq (n, d)
The bounds on Aq (n, d, w) can be used to give upper bounds on Aq (n, d) also due to Johnson
[159]. As can be seen from the proof, these bounds strengthen the Sphere Packing Bound.
The idea of the proof is to count not only the vectors in Fqn that are within distance t =
⌊(d − 1)/2⌋ of all codewords (that is, the disjoint spheres of radius t centered at codewords)
but also the vectors at distance t + 1 from codewords that are not within these spheres. To
accomplish this we need the following notation. If C is a code of length n over Fq and
x ∈ Fqn , let d(C, x) denote the distance from x to C. So d(C, x) = min{d(c, x) | c ∈ C}.
Theorem 2.3.8 (Johnson Bound for Aq (n, d)) Let t = ⌊(d − 1)/2⌋.
(i) If d is odd, then
Aq (n, d) ≤
n
(q − 1)i +
i
i=0
t
qn
.
d
n
t+1
Aq (n, d, d)
(q − 1) −
t
t +1
Aq (n, d, t + 1)
(ii) If d is even, then
Aq (n, d) ≤
qn
t
n
(q − 1)i +
i
i=0
.
n
(q − 1)t+1
t +1
Aq (n, d, t + 1)
(iii) If d is odd, then
A2 (n, d) ≤
n
+
i
i=0
t
2n
.
n
d
−
A2 (n, d, d)
t +1
t
n
t +1
(iv) If d is even, then
A2 (n, d) ≤
2n
.
n
t
n
t +1
+
n
i
i=0
t +1
(2.12)
66
Bounds on the size of codes
(v) If d is odd, then
A2 (n, d) ≤
2n
.
n−t
n
n−t
−
t
n
t +1
t +1
t
+
n
i
i=0
t +1
(2.13)
Proof: Let C be an (n, M, d) code over Fq . Notice that t is the packing radius of C;
d = 2t + 1 if d is odd and d = 2t + 2 if d is even. So the spheres of radius t centered at
codewords are disjoint. The vectors in these spheres are precisely the vectors in Fqn that are
distance t or less from C. We will count these vectors together with those vectors at distance
t + 1 from C and use this count to obtain our bounds. To that end let N be the vectors at
distance t + 1 from C; so N = {x ∈ Fqn | d(C, x) = t + 1}. Let |N | denote the size of N .
Therefore,
t
n
(q − 1)i + |N | ≤ q n ,
M
(2.14)
i
i=0
as the summation on the left-hand side counts the vectors in the spheres of radius t centered
at codewords; see (1.11). Our bounds will emerge after we obtain a lower bound on |N |.
Let X = {(c, x) ∈ C × N | d(c, x) = t + 1}. To get the lower bound on |N | we obtain
lower and upper estimates for |X |.
We first obtain a lower estimate on |X |. Let X c = {x ∈ N | (c, x) ∈ X }. Then
|X | =
c∈C
(2.15)
|X c |.
Fix c ∈ C. Let x ∈ Fqn be a vector at distance t + 1 from c so that wt(c − x) = t + 1. There
are exactly
n
(q − 1)t+1
t +1
such vectors x because they are obtained by freely changing any t + 1 coordinates of c.
Some of these lie in X c and some do not. Because wt(c − x) = t + 1, d(C, x) ≤ t + 1. Let
c′ ∈ C with c′ = c. Then by the triangle inequality of Theorem 1.4.1, d ≤ wt(c′ − c) =
wt(c′ − x − (c − x)) ≤ wt(c′ − x) + wt(c − x) = wt(c′ − x) + t + 1, implying
d − t − 1 ≤ wt(c′ − x).
(2.16)
If d = 2t + 2, wt(c′ − x) ≥ t + 1 yielding d(C, x) = t + 1 since c′ ∈ C was arbitrary and
we saw previously that d(C, x) ≤ t + 1. Therefore all such x lie in X c giving
n
(q − 1)t+1 ;
|X c | =
t +1
hence
|X | = M
n
(q − 1)t+1
t +1
if d = 2t + 2.
(2.17)
67
2.3 The Johnson Upper Bounds
If d = 2t + 1, wt(c′ − x) ≥ t by (2.16) yielding t ≤ d(C, x) ≤ t + 1. As we only want to
count the x where d(C, x) = t + 1, we will throw away those with d(C, x) = t. Such x must
simultaneously be at distance t from some codeword c′ ∈ C and at distance t + 1 from
c. Hence the distance from c′ to c is at most 2t + 1 = d, by the triangle inequality; this
distance must also be at least d as that is the minimum distance of C. Therefore we have
wt(c′ − c) = 2t + 1. How many c′ are possible? As the set {c′ − c | c′ ∈ C} forms a constant
weight code of length n and minimum distance d, whose codewords have weight d, there
are at most Aq (n, d, d) such c′ . For each c′ , how many x are there with wt(x − c) = t + 1
and t = wt(c′ − x) = wt((c′ − c) − (x − c))? Since wt(c′ − c) = 2t + 1, x − c is obtained
from c′ − c by arbitrarily choosing t of its 2t + 1 nonzero components and making them
zero. This can be done in ( dt ) ways. Therefore
d
n
Aq (n, d, d) ≤ |X c |
(q − 1)t+1 −
t
t +1
showing by (2.15) that
d
n
t+1
Aq (n, d, d) ≤ |X |
(q − 1) −
M
t
t +1
if d = 2t + 1.
(2.18)
We are now ready to obtain our upper estimate on |X |. Fix x ∈ N . How many c ∈ C
are there with d(c, x) = t + 1? The set {c − x | c ∈ C with d(c, x) = t + 1} is a constant
weight code of length n with words of weight t + 1 and minimum distance d because
(c′ − x) − (c − x) = c′ − c. Thus for each x ∈ N there are at most Aq (n, d, t + 1) choices
for c with d(c, x) = t + 1. Hence |X | ≤ |N |Aq (n, d, t + 1) or
|X |
≤ |N |.
Aq (n, d, t + 1)
(2.19)
We obtain bound (i) by combining (2.14), (2.18), and (2.19) and bound (ii) by combining (2.14), (2.17), and (2.19). Bounds (iii) and (iv) follow from (i) and (ii) by observing
that A2 (n, 2t + 1, t + 1) = A2 (n, 2t + 2, t + 1) = ⌊n/(t + 1)⌋ by Theorem 2.3.3. Finally,
bound (v) follows from (iii) and the observation (with details left as Exercise 102) that
d
d
A2 (n, 2t + 1, 2t + 1)
A2 (n, d, d) =
t
t
d
n
n−t
=
A2 (n, 2t + 2, 2t + 1) ≤
t
t
t +1
by Theorem 2.3.3(iv) and the Unrestricted Johnson Bound.
Exercise 102 Show that
n
n−t
d
A2 (n, 2t + 2, 2t + 1) ≤
,
t
t +1
t
using the Unrestricted Johnson Bound.
Example 2.3.9 Using (2.12) we compute that an upper bound for A2 (16, 6) is 263. Recall
that A2 (16, 6) = A2 (15, 5). Using (2.13), we discover that A2 (15, 5), and hence A2 (16, 6),
68
Bounds on the size of codes
is bounded above by 256. In the next subsection, we present a code that meets this
bound.
Exercise 103 Compute the best possible upper bounds for A2 (9, 4) = A2 (8, 3) and
A2 (13, 6) = A2 (12, 5) using the Johnson Bound. Compare these values to those in
Table 2.1.
2.3.4
The Nordstrom–Robinson code
The existence of the Nordstrom–Robinson code shows that the upper bound on A2 (16, 6)
discovered in Example 2.3.9 is met, and hence that A2 (16, 6) = 256.
The Nordstrom–Robinson code was discovered by Nordstrom and Robinson [247] and
later independently by Semakov and Zinov’ev [303]. This code can be defined in several
ways; one of the easiest is the following and is due to Goethals [100] and Semakov and
Zinov’ev [304].
Let C be the [24, 12, 8] extended binary Golay code chosen to contain the weight 8
codeword c = 11 · · · 100 · · · 0. Let T be the set consisting of the first eight coordinates.
Let C(T ) be the subcode of C which is zero on T , and let C T be C shortened on T . Let
C T be C punctured on the positions of T = {9, 10, . . . , 24}. By Corollary 1.4.14, as C is
self-dual, the first seven coordinate positions of C are linearly independent. Thus as c ∈ C
and C is self-dual, C T is the [8, 7, 2] binary code consisting of all even weight vectors of
length 8. Exercise 104 shows that the dimension of C T is 5. Hence C T is a [16, 5, 8] code.
(In fact C T is equivalent to R(1, 4) as Exercise 121 shows.) For 1 ≤ i ≤ 7, let ci ∈ C be a
codeword of C with zeros in the first eight coordinates except coordinate i and coordinate
8; such codewords are present in C because C T is all length 8 even weight vectors. Let
c0 = 0. For 0 ≤ j ≤ 7, let C j be the coset c j + C(T ) of C(T ) in the extended Golay code C.
These cosets are distinct, as you can verify in Exercise 105. Let N be the union of the eight
cosets C 0 , . . . , C 7 . The Nordstrom–Robinson code N 16 is the code obtained by puncturing
N on T . Thus N 16 is the union of C ∗0 , . . . , C ∗7 , where C ∗j is C j punctured on T . Figure 2.1
gives a picture of the construction. Clearly, N 16 is a (16, 256) code, as Exercise 106 shows.
Let a, b ∈ N be distinct. Then d(a, b) ≥ 8, as C has minimum distance 8. Since a and b
C0
C1
C2
..
.
C7
T
N 16
00000000
32 codewords of C ∗0
10000001
32 codewords of C ∗1
01000001
32 codewords of C ∗2
..
.
..
.
00000011
32 codewords of C ∗7
Figure 2.1 The Nordstrom–Robinson code inside the extended Golay code.
69
2.3 The Johnson Upper Bounds
disagree on at most two of the first eight coordinates, the codewords of N 16 obtained from
a and b by puncturing on T are distance 6 or more apart. Thus N 16 has minimum distance
at least 6 showing that A2 (16, 6) = 256. In particular, N 16 is optimal.
Exercise 104 Show that C T has dimension 5. Hint: C = C ⊥ ; apply Theorem 1.5.7(iii).
Exercise 105 Show that the cosets C j for 0 ≤ j ≤ 7 are distinct.
Exercise 106 Show that N 16 has 256 codewords as claimed.
We compute the weight distribution Ai (N 16 ) of N 16 . Clearly
7
j=0
Ai (C ∗j ) = Ai (N 16 ).
(2.20)
By Theorem 1.4.5(iv), C contains the all-one codeword 1. Hence as c + 1 has 0s in the
first eight coordinates and 1s in the last 16, C ∗0 contains the all-one vector of length 16. By
Exercise 107 A16−i (C ∗j ) = Ai (C ∗j ) for 0 ≤ i ≤ 16 and 0 ≤ j ≤ 7. As C ∗j is obtained from C
by deleting eight coordinates on which the codewords have even weight, Ai (C ∗j ) = 0 if i
is odd. By construction N 16 contains 0. As N 16 has minimum distance 6, we deduce that
Ai (N 16 ) = 0 for 1 ≤ i ≤ 5 and therefore that Ai (C ∗j ) = 0 for 1 ≤ i ≤ 5 and 11 ≤ i ≤ 15.
As C ∗0 = C T and the weights of codewords in C(T ) are multiples of 4, so are the weights
of vectors in C ∗0 . Since C ∗0 has 32 codewords, A0 (C ∗0 ) = A16 (C ∗0 ) = 1 and A8 (C ∗0 ) = 30, the
other Ai (C ∗0 ) being 0. For 1 ≤ j ≤ 7, the codewords in C j have weights a multiple of 4;
since each codeword has two 1s in the first eight coordinates, the vectors in C ∗j have weights
that are congruent to 2 modulo 4. Therefore the only possible weights of vectors in C j
are 6 and 10, and since A6 (C ∗j ) = A10 (C ∗j ), these both must be 16. Therefore by (2.20),
A0 (N 16 ) = A16 (N 16 ) = 1, A6 (N 16 ) = A10 (N 16 ) = 7 · 16 = 112, and A8 (N 16 ) = 30, the
other Ai (N 16 ) being 0.
Exercise 107 Prove that A16−i (C ∗j ) = Ai (C ∗j ) for 0 ≤ i ≤ 16 and 0 ≤ j ≤ 7.
It turns out that N 16 is unique [317] in the following sense. If C is any binary (16, 256, 6)
code, and c is a codeword of C, then the code c + C = {c + x | x ∈ C} is also a (16, 256, 6)
code containing the zero vector (see Exercise 86) and this code is equivalent to N 16 .
Exercise 108 From N 16 , produce (15, 128, 6), (14, 64, 6), and (13, 32, 6) codes. Note that
these codes are optimal; see Table 2.1 and Exercise 92.
2.3.5
Nearly perfect binary codes
We explore the case when bound (2.13) is met. This bound strengthens the Sphere Packing
Bound and the two bounds in fact agree precisely when (t + 1) | (n − t). Recall that codes
that meet the Sphere Packing Bound are called perfect. An (n, M, 2t + 1) binary code with
M = A2 (n, 2t + 1) which attains the Johnson Bound (2.13) is called nearly perfect.
A natural problem is to classify the nearly perfect codes. As just observed, the Johnson
Bound strengthens the Sphere Packing Bound, and so perfect codes are nearly perfect
70
Bounds on the size of codes
(and (t + 1) | (n − t)); see Exercise 109. Nearly perfect codes were first examined by
Semakov, Zinov’ev, and Zaitsev [305] and independently by Goethals and Snover [101].
The next two examples, found in [305, Theorem 1], give parameters for other nearly perfect
codes.
Exercise 109 As stated in the text, because the Johnson Bound strengthens the Sphere
Packing Bound, perfect codes are nearly perfect. Fill in the details showing why this is
true.
Example 2.3.10 Let C be an (n, M, 3) nearly perfect code. So t = 1. If n is odd, (t + 1) |
(n − t) and so the Sphere Packing Bound and (2.13) agree. Thus C is a perfect single errorcorrecting code and must have the parameters of H2,r by Theorem 1.12.3. (We do not
actually need Theorem 1.12.3 because the Sphere Packing Bound gives M = 2n /(1 + n);
r
so n = 2r − 1 for some r and M = 22 −1−r .) If n is even, equality in (2.13) produces
M=
2n
.
n+2
Hence M is an integer if and only if n = 2r − 2 for some integer r . Therefore the only
possible sets of parameters for nearly perfect (n, M, 3) codes that are not perfect are
r
(n, M, 3) = (2r − 2, 22 −2−r , 3) for r ≥ 3. For example, the code obtained by puncturing
the subcode of even weight codewords in H2,r is a linear code having these parameters.
(See Exercise 110.)
Exercise 110 Prove that the code obtained by puncturing the subcode of even weight
r
codewords in H2,r is a linear code having parameters (n, M, 3) = (2r − 2, 22 −2−r , 3).
Example 2.3.11 Let C be an (n, M, 5) nearly perfect code. So t = 2. If n ≡ 2 (mod 3),
(t + 1) | (n − t) and again the Sphere Packing Bound and (2.13) agree. Thus C is a perfect
double error-correcting code which does not exist by Theorem 1.12.3. If n ≡ 1 (mod 3),
equality in (2.13) yields
M=
2n+1
.
(n + 2)(n + 1)
So for M to be an integer, both n + 1 and n + 2 must be powers of 2, which is impossible
for n ≥ 1. Finally, consider the case n ≡ 0 (mod 3). Equality in (2.13) gives
M=
2n+1
.
(n + 1)2
m
So n = 2m − 1 for some m, and as 3 | n, m must be even. Thus C is a (2m − 1, 22 −2m , 5)
code, a code that has the same parameters as the punctured Preparata code P(m)∗ , which
we will describe in Chapter 12.
These two examples provide the initial steps in the classification of the nearly perfect
codes, a work begun by Semakov, Zinov’ev, and Zaitsev [305] and completed by Lindström
[199, 200]; one can also define nearly perfect nonbinary codes. These authors show that all
of the nearly perfect binary codes are either perfect, or have parameters of either the codes
71
2.4 The Singleton Upper Bound and MDS codes
in Example 2.3.10 or the punctured Preparata codes in Example 2.3.11, and that all nearly
perfect nonbinary codes must be perfect.
2.4
The Singleton Upper Bound and MDS codes
The next upper bound for Aq (n, d) and Bq (n, d), called the Singleton Bound, is much
simpler to prove than the previous upper bounds. It is a rather weak bound in general but
does lead to the class of codes called MDS codes; this class contains the very important
family of codes known as Reed–Solomon codes, which are generally very useful in many
applications. They correct burst errors and provide the high fidelity in CD players.
Theorem 2.4.1 (Singleton Bound [312]) For d ≤ n,
Aq (n, d) ≤ q n−d+1 .
Furthermore if an [n, k, d] linear code over Fq exists, then k ≤ n − d + 1.
Proof: The second statement follows from the first by Theorem 2.1.1. Recall that
Aq (n, n) = q by Theorem 2.1.5 yielding the bound when d = n. Now assume that
d < n. By Theorem 2.1.6 Aq (n, d) ≤ q Aq (n − 1, d). Inductively we have that Aq (n, d) ≤
q n−d Aq (d, d). Since Aq (d, d) = q, Aq (n, d) ≤ q n−d+1 .
Example 2.4.2 The hexacode of Example 1.3.4 is a [6, 3, 4] linear code over F4 . In this
code, k = 3 = 6 − 4 + 1 = n − d + 1 and the Singleton Bound is met. So A4 (6, 4) =
43 .
Exercise 111 Prove using either the parity check matrix or the standard form of the generator
matrix for an [n, k, d] linear code that d ≤ n − k + 1, hence verifying directly the linear
version of the Singleton Bound.
A code for which equality holds in the Singleton Bound is called maximum distance
separable, abbreviated MDS. No code of length n and minimum distance d has more
codewords than an MDS code with parameters n and d; equivalently, no code of length n
with M codewords has a larger minimum distance than an MDS code with parameters n
and M. We briefly discuss some results on linear MDS codes.
Theorem 2.4.3 Let C be an [n, k] code over Fq with k ≥ 1. Then the following are
equivalent:
(i) C is MDS.
(ii) Every set of k coordinates is an information set for C.
(iii) C ⊥ is MDS.
(iv) Every set of n − k coordinates is an information set for C ⊥ .
Proof: The first two statements are equivalent by Theorem 1.4.15 as an [n, k] code is MDS
if and only if k = n − d + 1. Similarly the last two are equivalent. Finally, (ii) and (iv) are
equivalent by Theorem 1.6.2.
72
Bounds on the size of codes
We say that C is a trivial MDS code over Fq if and only if C = Fqn or C is monomially
equivalent to the code generated by 1 or its dual. By examining the generator matrix in
standard form, it is straightforward to verify the following result about binary codes.
Theorem 2.4.4 Let C be an [n, k, d] binary code.
(i) If C is MDS, then C is trivial.
(ii) If 3 ≤ d and 5 ≤ k, then k ≤ n − d − 1.
Exercise 112 Prove Theorem 2.4.4.
We will discuss other aspects of MDS codes in Section 7.4. Trivial MDS codes are
arbitrarily long. Examples of nontrivial MDS codes are Reed–Solomon codes over Fq of
length n = q − 1 and extensions of these codes of lengths q and q + 1. Reed–Solomon
codes and their generalizations will be examined in Chapter 5. The weight distribution of
an MDS code over Fq is determined by its parameters n, k, and q (see Theorem 7.4.1). If an
MDS code is nontrivial, its length is bounded as a function of q and k (see Corollary 7.4.4).
2.5
The Elias Upper Bound
The ideas of the proof of the Plotkin Bound can be used to find a bound that applies to a larger
range of minimum distances. This extension was discovered in 1960 by Elias but he did not
publish it. Unfortunately, the Elias Bound is sometimes rather weak; see Exercises 114 and
115. However, the Elias Bound is important because the asymptotic form of this bound,
derived later in this chapter, is superior to all of the asymptotic bounds we discuss except
the MRRW Bounds. Before stating the Elias Bound, we need two lemmas.
Lemma 2.5.1 Let C be an (n, K , d) code over Fq such that all codewords have weights at
most w, where w ≤ r n with r = 1 − q −1 . Then
d≤
w
Kw
2−
.
K −1
rn
Proof: As in the proof of the Plotkin Bound, let M be the K × n matrix whose rows are
the codewords of C. For 1 ≤ i ≤ n, let n i,α be the number of times α ∈ Fq occurs in column
i of M. Clearly,
α∈Fq
Also, if T =
n
T =
for 1 ≤ i ≤ n.
n i,α = K
i=1
n
i=1
(2.21)
n i,0 , then
n i,0 ≥ K (n − w)
(2.22)
73
Y
L
2.5 The Elias Upper Bound
F
as every row of M has at least n − w zeros. By the Cauchy–Schwartz inequality and (2.21)
2
1
1
2
(K − n i,0 )2
n i,α
≥
n i,α =
and
(2.23)
q
−
1
q
−
1
∗
∗
T
m
a
e
α∈Fq
n
2
n i,0
i=1
1
≥
n
α∈Fq
n
n i,0
i=1
2
=
1 2
T .
n
(2.24)
Again, exactly as in the proof of the Plotkin Bound, using (2.3) and (2.4),
n
K (K − 1)d ≤
x∈C y∈C
d(x, y) =
i=1 α∈Fq
n i,α (K − n i,α ).
Using first (2.21), then (2.23), and finally (2.24),
n
n
2
i=1 α∈Fq
n i,α (K − n i,α ) = n K −
≤ nK2 −
i=1
2
n i,0
1
q −1
n
i=1
+
∗
α∈Fq
(2.25)
2
n i,α
2
qn i,0 + K 2 − 2K n i,0
1 q 2
T + n K 2 − 2K T .
≤ nK2 −
q −1 n
Since w ≤ r n, qw ≤ (q − 1)n implying n ≤ qn − qw and hence
q
K ≤ K (n − w).
n
Also as w ≤ r n, by (2.22)
q
K ≤ T.
n
(2.26)
(2.27)
(2.28)
Adding (2.27) and (2.28) gives 2K ≤ qn −1 (T + K (n − w)). Multiplying both sides by
T − K (n − w), which is nonnegative by (2.22), produces
q
2K [T − K (n − w)] ≤ [T 2 − K 2 (n − w)2 ]
n
and hence
q 2
q
K (n − w)2 − 2K 2 (n − w) ≤ T 2 − 2K T.
n
n
Substituting this into (2.26) and using (2.25) yields
!
q 2
1
K (n − w)2 + n K 2 − 2K 2 (n − w) .
K (K − 1)d ≤ n K 2 −
q −1 n
Simplifying the right-hand side produces
q w
w
K (K − 1)d ≤ K 2 w 2 −
= K 2w 2 −
.
q −1 n
rn
Solving for d verifies the lemma.
74
Bounds on the size of codes
By (1.11) the number of vectors in a sphere of radius a in Fqn centered at some vector in
denoted Vq (n, a), is
Fqn ,
n
(q − 1)i .
i
i=0
a
Vq (n, a) =
(2.29)
Lemma 2.5.2 Suppose C is an (n, M, d) code over Fq . Then there is an (n, M, d) code C ′
over Fq with an (n, K , d) subcode A containing only codewords of weight at most w such
that K ≥ M Vq (n, w)/q n .
Proof: Let Sw (0) be the sphere in Fqn of radius w centered at 0. Let x ∈ Fqn be chosen so
that |Sw (0) ∩ (x + C)| is maximal. Then
|Sw (0) ∩ (x + C)| ≥
1
qn
=
1
qn
=
1
qn
n
y∈Fq
|Sw (0) ∩ (y + C)|
n
y∈Fq b∈Sw (0) c∈C
b∈Sw (0) c∈C
1=
|{b} ∩ {y + c}|
1
1
|Sw (0)||C| = n Vq (n, w)M.
qn
q
The result follows by letting C ′ = x + C and A = Sw (0) ∩ C ′ .
Theorem 2.5.3 (Elias Bound) Let r = 1 − q −1 . Suppose that w ≤ r n and w 2 − 2r nw +
r nd > 0. Then
Aq (n, d) ≤
qn
r nd
·
.
w2 − 2r nw + r nd Vq (n, w)
Proof: Let M = Aq (n, d). By Lemma 2.5.2 there is an (n, K , d) code over Fq containing
only codewords of weight at most w such that
M Vq (n, w)/q n ≤ K .
(2.30)
As w ≤ r n, Lemma 2.5.1 implies that d ≤ K w(2 − w/(r n))/(K − 1). Solving for K and
using w2 − 2r nw + r nd > 0 yields
K ≤
w2
r nd
.
− 2r nw + r nd
Putting (2.30) and (2.31) together gives the bound.
(2.31)
Example 2.5.4 By Theorem 2.1.2, A2 (13, 5) = A2 (14, 6). By the Sphere Packing
Bound, A2 (13, 5) ≤ 2048/23 implying A2 (13, 5) ≤ 89, and A2 (14, 6) ≤ 8192/53 implying
A2 (14, 6) ≤ 154. The Johnson Bound yields A2 (13, 5) ≤ 8192/105, showing A2 (13, 5) ≤
78; and A2 (14, 6) ≤ 16 384/197, showing A2 (14, 6) ≤ 83. The following table gives the
upper bounds on A2 (13, 5) and A2 (14, 6) using the Elias Bound. Note that each w ≤ n/2
75
2.6 The Linear Programming Upper Bound
such that w2 − nw + (nd/2) > 0 for (n, d) = (13, 5) and (n, d) = (14, 6) must be tried.
w
A2 (13, 5)
A2 (14, 6)
0
1
2
3
4
8192
927
275
281
16 384
1581
360
162
233
Thus the best upper bound from the Elias Bound for A2 (13, 5) = A2 (14, 6) is 162, while the
best bound from the Sphere Packing and Johnson Bounds is 78. By Table 2.1, A2 (13, 5) =
A2 (14, 6) = 64; see also Exercises 92 and 108.
Exercise 113 Verify that the entries in the table of Example 2.5.4 are correct and that all
possible values of w have been examined.
Exercise 114 By Theorem 2.1.2, A2 (9, 5) = A2 (10, 6).
(a) Compute upper bounds on both A2 (9, 5) and A2 (10, 6) using the Sphere Packing Bound,
the Plotkin Bound, and the Elias Bound. When computing the Elias Bound make sure
all possible values of w have been checked.
(b) What is the best upper bound for A2 (9, 5) = A2 (10, 6)?
(c) Find a binary code of length 10 and minimum distance 6 meeting the bound in part
(b). Hint: This can be constructed using the zero vector with the remaining codewords
having weight 6. (Note: This verifies the entry in Table 2.1.)
Exercise 115 Prove that when w = r n the condition w 2 − 2r nw + r nd > 0 becomes
r n < d and the Elias Bound is weaker than the Plotkin Bound in this case.
2.6
The Linear Programming Upper Bound
The next upper bound that we present is the linear programming bound which uses results of
Delsarte [61, 62, 63]. In general, this is the most powerful of the bounds we have presented
but, as its name signifies, does require the use of linear programming. In order to present this
bound, we introduce two concepts. First, we generalize the notion of weight distribution of
a code. The (Hamming) distance distribution or inner distribution of a code C of length n
is the list Bi = Bi (C) for 0 ≤ i ≤ n, where
Bi (C) =
1
|C|
c∈C
|{v ∈ C | d(v, c) = i}|.
By Exercise 117, the distance distribution and weight distribution of a linear code are
identical. In particular, if C is linear, the distance distribution is a list of nonnegative integers;
if C is nonlinear, the Bi are nonnegative but need not be integers. We will see the distance
n,q
distribution again in Chapter 12. Second, we define the Krawtchouck polynomial K k (x)
76
Bounds on the size of codes
of degree k to be
k
n,q
K k (x) =
j=0
(−1) j (q − 1)k− j
x n−x
j k− j
for 0 ≤ k ≤ n.
In 1957, Lloyd [206], in his work on perfect codes, was the first to use the Krawtchouck polynomials in connection with coding theory. We will see related applications of Krawtchouck
polynomials in Chapters 7 and 12.
Exercise 116 Let q = 2. Then K kn,2 (x) = kj=0 (−1) j xj n−x
.
k− j
(a) Prove that if w is an integer with 0 ≤ w ≤ n, then
n
w
w n−w
w n−w
K kn,2 (w) =
(−1) j
(−1) j
=
.
j
k− j
j
k− j
j=0
j=0
Hint: Observe when some of the binomial coefficients are 0.
n,2
(w). Hint: By part (a),
(b) Prove that K kn,2 (w) = (−1)w K n−k
w
n−w
n,2
j w
K k (w) =
(−1)
and hence
k− j
j
j=0
w
n−w
=
(−1)
.
j
n−k− j
j=0
w
n,2
(w)
K n−k
j
In one of the summations replace j by w − j and use
r
s
=
r
r −s
.
Exercise 117 Let Bi , 0 ≤ i ≤ n, be the distance distribution of an (n, M) code C over
Fq .
n
(a) Prove that i=0
Bi = M.
(b) Prove that B0 = 1.
(c) Prove that if q = 2, Bn ≤ 1.
(d) Prove that the distance distribution and weight distribution of C are identical if C is
linear.
As with the other upper bounds presented so far, the Linear Programming Bound applies
to codes that may be nonlinear. In fact, the alphabet over which the code is defined is not
required to be a finite field; the Linear Programming Bound depends only on the code
parameters, including the alphabet size, but not the specific alphabet. This is an advantage
as the preliminary lemmas needed to derive the bound can most easily be proved if we
use Zq , the integers modulo q, as the alphabet. To facilitate this, define α: Fq → Zq to be
any bijection from Fq to Zq with α(0) = 0. Of course if q is a prime, we can choose α to
be the identity. If c = (c1 , c2 , . . . , cn ) ∈ Fqn , define α(c) = (α(c1 ), α(c2 ), . . . , α(cn )) ∈ Zqn .
If C is an (n, M) code over Fq of length n, define α(C) = {α(c) | c ∈ C}. As α(0) = 0,
if c and v are in Fqn , then wt(c) = wt(α(c)) and d(c, v) = d(α(c), α(v)) implying that the
weight and distance distributions of C are identical to the weight and distance distributions,
respectively, of α(C). In particular, when replacing Zq with any alphabet with q elements,
this discussion shows the following.
77
2.6 The Linear Programming Upper Bound
Theorem 2.6.1 There exists an (n, M) code C over Fq if and only if there exists an (n, M)
code over any alphabet with q elements having the same distance distribution as C.
This theorem shows that any of the bounds on the size of a (possibly) nonlinear code over
Fq that we have derived apply to codes over any alphabet with q elements. As we will see
in Chapter 12, codes over Zq have been studied extensively.
In Zqn define the ordinary inner product as done over fields, namely, u · v = u 1 v1 + · · · +
u n vn . As with fields, let Zq∗ = Zq \ {0}.
Lemma 2.6.2 Let ξ = e2πi/q in the complex numbers C, where i =
wt(u) = w. Then
√
−1. Let u ∈ Zqn with
n,q
v∈Zqn
ξ u·v = K k (w).
wt(v)=k
Proof: Rearrange coordinates so that u = u 1 u 2 · · · u w 0 · · · 0, where u m = 0 for 1 ≤ m ≤
w. Let A = {a1 , a2 , . . . , ak } be a set of k coordinates satisfying
1 ≤ a1 < a2 < · · · < a j ≤ w < a j+1 < · · · < ak .
Let S = {v ∈ Zqn | wt(v) = k with nonzero coordinates exactly in A}. Then
k
v∈Zqn
wt(v)=k
ξ u·v =
ξ u·v .
(2.32)
j=0 A v∈S
The lemma follows
if we show that the inner sum
choices for A.
as there are wj n−w
k− j
Before we do this, notice that
v∈S
ξ u·v always equals (−1) j (q − 1)k− j
q−1
v=1
ξ uv = −1
if u ∈ Zq with u = 0.
(2.33)
q−1
This is because v=0 ξ uv = ((ξ u )q − 1)/(ξ u − 1) = 0 as ξ q = 1. Examining the desired
inner sum of (2.32), we obtain
v∈S
ξ u·v =
va1 ∈Zq∗ va2 ∈Zq∗
= (q − 1)k− j
···
vak ∈Zq∗
va1 ∈Zq∗
ξ u a1 va1 ξ u a2 va2 · · · ξ u ak vak
va2 ∈Zq∗
···
va j ∈Zq∗
ξ u a1 va1 ξ u a2 va2 · · · ξ u a j va j .
(2.34)
The last equality follows because u a j+1 = u a j+2 = · · · = u ak = 0. But using (2.33),
va1 ∈Zq∗
va2 ∈Zq∗
···
va j ∈Zq∗
ξ u a1 va1 ξ u a2 va2 · · · ξ u a j va j =
Combining this with (2.34), we have
v∈S
j
q−1
m=1 v=0
ξ u am v = (−1) j .
ξ u·v = (−1) j (q − 1)k− j as required.
78
Bounds on the size of codes
In Theorem 7.2.3 we will show that if C is a linear code of length n over Fq with
n
n,q
weight distribution Aw for 0 ≤ w ≤ n, then A⊥
k = 1/|C|
w=0 Aw K k (w) for 0 ≤ k ≤ n
is the weight distribution of C ⊥ . As C is linear Aw = Bw by Exercise 117. In particular
n,q
for linear codes nw=0 Bw K k (w) ≥ 0. The next lemma, which is the basis of the Linear
Programming Bound, shows that this inequality holds where Bw is the distance distribution
of a (possibly) nonlinear code.
Lemma 2.6.3 Let Bw , 0 ≤ w ≤ n, be the distance distribution of a code over Fq . Then
n
n,q
w=0
Bw K k (w) ≥ 0
(2.35)
for 0 ≤ k ≤ n.
Proof: By Theorem 2.6.1 we only need to verify these inequalities for an (n, M) code C
over Zq . By definition of the distance distribution and Lemma 2.6.2,
n
n
n
n,q
n,q
M
w=0
Bw K k (w) =
w=0
(x,y)∈C 2
K k (w) =
ξ (x−y)·v
w=0
d(x,y)=w
=
=
proving the result.
(x,y)∈C 2 v∈Zqn
wt(v)=k
ξ
v∈Zqn
wt(v)=k
x∈C
ξ x·v ξ −y·v =
x·v
ξ
x∈C
−x·v
v∈Zqn
d(x,y)=w wt(v)=k
(x,y)∈C 2
ξ x·v
v∈Zqn
wt(v)=k
=
x∈C
v∈Zqn
wt(v)=k
ξ −y·v
y∈C
2
x·v
ξ ≥ 0,
x∈C
If C is an (n, M, d) code over Fq with distance distribution Bw , 0 ≤ w ≤ n, then M =
n
B0 = 1 by Exercise 117. Also B1 = B2 = · · · = Bd−1 = 0. By Lemma 2.6.3,
w=0 Bw and
n,q
we also have nw=0 Bw K k (w) ≥ 0 for 0 ≤ k ≤ n. However, this inequality is merely
n
n,q
w=0 Bw ≥ 0 when k = 0 as K 0 (w) = 1, which is clearly already true. If q = 2, again by
Exercise 117, Bn ≤ 1, and furthermore if d is even, by Theorem 2.1.2, we may also assume
n,2
(w). Thus
that Bw = 0 when w is odd. By Exercise 116, when w is even, K kn,2 (w) = K n−k
the kth inequality in (2.35) is the same as the (n − k)th as the only (possibly) nonzero Bw s
are when w is even. This discussion yields our bound.
Theorem 2.6.4 (Linear Programming Bound) The following hold:
(i) When q ≥ 2, Aq (n, d) ≤ max{ nw=0 Bw }, where the maximum is taken over all Bw
subject to the following conditions:
(a) B0 = 1 and Bw = 0 for 1 ≤ w ≤ d − 1,
(b) Bw ≥ 0 for d ≤ w ≤ n, and
n
n,q
(c)
w=0 Bw K k (w) ≥ 0 for 1 ≤ k ≤ n.
(ii) When d is even and q = 2, A2 (n, d) ≤ max{ nw=0 Bw }, where the maximum is taken
over all Bw subject to the following conditions:
(a) B0 = 1 and Bw = 0 for 1 ≤ w ≤ d − 1 and all odd w,
79
2.6 The Linear Programming Upper Bound
(b) Bw ≥ 0 for d ≤ w ≤ n and Bn ≤ 1, and
(c) nw=0 Bw K kn,2 (w) ≥ 0 for 1 ≤ k ≤ ⌊n/2⌋.
Solving the inequalities of this theorem is accomplished by linear programming, hence
the name. At times other inequalities can be added to the list which add more constraints to
the linear program and reduce the size of nw=0 Bw . In specific cases other variations to the
Linear Programming Bound can be performed to achieve a smaller upper bound. Many of the
upper bounds in Table 2.1 come from the Linear Programming Bound and these variations.
Example 2.6.5 We apply the Linear Programming Bound to obtain an upper bound on
A2 (8, 3). By Theorem 2.1.2, A2 (8, 3) = A2 (9, 4). Hence we apply the Linear Programming
Bound (ii) with q = 2, n = 9, and d = 4. Thus we are trying to find a solution to max{1 +
B4 + B6 + B8 } where
9
36
84
126
+ B4 − 3B6
− 4B4
− 4B4 + 8B6
+ 6B4 − 6B6
−
+
−
+
7B8
20B8
28B8
14B8
≥
≥
≥
≥
0
0
0
0
(2.36)
with B4 ≥ 0, B6 ≥ 0, and B8 ≥ 0. The unique solution to this linear program is B4 =
18, B6 = 24/5, and B8 = 9/5. Hence max{1 + B4 + B6 + B8 } = 1 + (123/5), implying
A2 (9, 4) ≤ 25.
We can add two more inequalities to (2.36). Let C be a (9, M, 4) code, and let x ∈ C.
Define C x = x + C; note that C x is also a (9, M, 4) code. The number of codewords in C at
distance 8 from x is the same as the number of vectors of weight 8 in C x . As there is clearly
no more than one vector of weight 8 in a (9, M, 4) code, we have
B8 ≤ 1.
(2.37)
Also the number of codewords at distance 6 from x is the same as the number of vectors of
weight 6 in C x ; this number is at most A2 (9, 4, 6), which is 12 by Exercise 100. Furthermore,
if there is a codeword of weight 8 in C x , then every vector in C x of weight 6 has a 1 in the
coordinate where the weight 8 vector is 0. This means that the number of weight 6 vectors
in C x is the number of weight 5 vectors in the code obtained by puncturing C x on this
coordinate. That number is at most A2 (8, 4, 5), which is 8 by Exercise 101. Putting these
together shows that
B6 + 4B8 ≤ 12.
(2.38)
Including inequalities (2.37) and (2.38) with (2.36) gives a linear program which when
solved yields the unique solution B4 = 14, B6 = 16/3, and B8 = 1 implying max{1 +
B4 + B6 + B8 } = 64/3. Thus A2 (9, 4) ≤ 21. By further modifying the linear program (see
[218, Chapter 17]) it can be shown that A2 (9, 4) ≤ 20. In fact A2 (9, 4) = 20; see Table 2.1.
See also Exercise 103.
Exercise 118 Find an upper bound on A2 (9, 3) = A2 (10, 4) as follows. (It may be helpful
to use a computer algebra program that can perform linear programming.)
80
Bounds on the size of codes
(a)
(b)
(c)
(d)
Give the Sphere Packing, Johnson, and Elias Bounds for A2 (9, 3) and A2 (10, 4).
Apply the Linear Programming Bound (ii) with n = 10, q = 2, and d = 4.
Prove that B8 + 5B10 ≤ 5.
Combine the inequalities from the Linear Programming Bound (ii) with the inequality
in (c) to obtain an upper bound for A2 (10, 4). Is it an improvement over the bound found
in (b)?
(e) What is the best bound on A2 (9, 3) = A2 (10, 4) from parts (a), (b), and (d)? How does
it compare to the value in Table 2.1?
(f) Assuming the value of A2 (9, 4) in Table 2.1 is correct, what bound do you get on
A2 (9, 3) = A2 (10, 4) from Theorem 2.1.6?
Exercise 119 Do the following to obtain bounds on A2 (13, 5) = A2 (14, 6). (It may be
helpful to use a computer algebra program that can perform linear programming.)
(a) Give the Sphere Packing, Johnson, and Elias Bounds for A2 (13, 5) and A2 (14, 6).
(b) Apply the Linear Programming Bound (ii) with n = 14, q = 2, and d = 6 to obtain an
upper bound for A2 (14, 6).
(c) What is the best bound on A2 (13, 5) = A2 (14, 6) from parts (a) and (b)? How does it
compare to the value in Table 2.1?
(d) Assuming the value of A2 (13, 6) in Table 2.1 is correct, what bound do you get on
A2 (13, 5) = A2 (14, 6) from Theorem 2.1.6?
2.7
The Griesmer Upper Bound
The final upper bound we discuss is a generalization of the Singleton Bound known as the
Griesmer Bound. We place it last because, unlike our other upper bounds, this one applies
only to linear codes.
To prove this bound we first discuss the generally useful idea of a residual code due to
H. J. Helgert and R. D. Stinaff [120]. Let C be an [n, k] code and let c be a codeword of
weight w. Let the set of coordinates on which c is nonzero be I. Then the residual code
of C with respect to c, denoted Res(C, c), is the code of length n − w punctured on all the
coordinates of I. The next result gives a lower bound for the minimum distance of residual
codes [131].
Theorem 2.7.1 Let C be an [n, k, d] code over Fq and let c be a codeword of weight w <
(q/(q − 1))d. Then Res(C, c) is an [n − w, k − 1, d ′ ] code, where d ′ ≥ d − w + ⌈w/q⌉.
Proof: By replacing C by a monomially equivalent code, we may assume that c =
11 · · · 100 · · · 0. Since puncturing c on its nonzero coordinates gives the zero vector,
Res(C, c) has dimension less than k. Assume that the dimension is strictly less than k − 1.
Then there exists a nonzero codeword x = x1 · · · xn ∈ C which is not a multiple of c with
xw+1 · · · xn = 0. There exists α ∈ Fq such that at least w/q coordinates of x1 · · · xw equal
α. Therefore
w(q − 1)
w
=
,
d ≤ wt(x − αc) ≤ w −
q
q
contradicting our assumption on w. Hence Res(C, c) has dimension k − 1.
81
2.7 The Griesmer Upper Bound
We now establish the lower bound for d ′ . Let xw+1 · · · xn be any nonzero codeword in
Res(C, c), and let x = x1 · · · xw xw+1 · · · xn be a corresponding codeword in C. There exists
α ∈ Fq such that at least w/q coordinates of x1 · · · xw equal α. So
d ≤ wt(x − αc) ≤ w −
w
+ wt(xw+1 · · · xn ).
q
Thus every nonzero codeword of Res(C, c) has weight at least d − w + ⌈w/q⌉.
Applying Theorem 2.7.1 to a codeword of minimum weight we obtain the following.
Corollary 2.7.2 If C is an [n, k, d] code over Fq and c ∈ C has weight d, then Res(C, c) is
an [n − d, k − 1, d ′ ] code, where d ′ ≥ ⌈d/q⌉.
Recall that the Nordstrom–Robinson code defined in Section 2.3.4 is a nonlinear binary
code of length 16 and minimum distance 6, with 256 = 28 codewords, and, as we described,
its existence together with the Johnson Bound (see Example 2.3.9) implies that A2 (16, 6) =
28 . It is natural to ask whether B2 (16, 6) also equals 28 . In the next example we illustrate
how residual codes can be used to show that no [16, 8, 6] binary linear code exists, thus
implying that B2 (16, 6) ≤ 27 .
Example 2.7.3 Let C be a [16, 8, 6] binary linear code. Let C 1 be the residual code of C
with respect to a weight 6 vector. By Corollary 2.7.2, C 1 is a [10, 7, d ′ ] code with 3 ≤ d ′ ;
by the Singleton Bound d ′ ≤ 4. If d ′ = 4, C 1 is a nontrivial binary MDS code, which is
impossible by Theorem 2.4.4. So d ′ = 3. Notice that we have now reduced the problem to
showing the nonexistence of a [10, 7, 3] code. But the nonexistence of this code follows
from the Sphere Packing Bound as
210
27 > .
10
10
+
0
1
Exercise 120 In Exercise 93, we showed that B2 (13, 6) ≤ 25 . Show using residual codes
that B2 (13, 6) ≤ 24 . Also construct a code that meets this bound.
Exercise 121 Do the following:
(a) Use the residual code to prove that a [16, 5, 8] binary code contains the all-one codeword 1.
(b) Prove that a [16, 5, 8] binary code has weight distribution A0 = A16 = 1 and A8 = 30.
(c) Prove that all [16, 5, 8] binary codes are equivalent.
(d) Prove that R(1, 4) is a [16, 5, 8] binary code.
Theorem 2.7.4 (Griesmer Bound [112]) Let C be an [n, k, d] code over Fq with k ≥ 1.
Then
#
k−1 "
d
.
n≥
qi
i=0
82
Bounds on the size of codes
Proof: The proof is by induction on k. If k = 1 the conclusion clearly holds. Now assume
that k > 1 and let c ∈ C be a codeword of weight d. By Corollary 2.7.2, Res(C, c) is an
[n − d, k − 1, d ′ ] code, where d ′ ≥ ⌈d/q⌉. Applying the inductive assumption to Res(C, c),
k−2
⌈d/q i+1 ⌉ and the result follows.
we have n − d ≥ i=0
Since ⌈d/q 0 ⌉ = d and ⌈d/q i ⌉ ≥ 1 for i = 1, . . . , k − 1, the Griesmer Bound implies the
linear case of the Singleton Bound.
The Griesmer Bound gives a lower bound on the length of a code over Fq with a prescribed
dimension k and minimum distance d. The Griesmer Bound does provide an upper bound
on Bq (n, d) because, given n and d, there is a largest k for which the Griesmer Bound holds.
Then Bq (n, d) ≤ q k .
Given k, d, and q there need not exist an [n, k, d] code over Fq which meets the Griesmer
Bound; that is, no code may exist where there is equality in the Griesmer Bound. For
example, by the Griesmer Bound, a binary code of dimension k = 12 and minimum distance
d = 7 has length n ≥ 22. Thus the [23, 12, 7] binary Golay code does not meet the Griesmer
Bound. But a [22, 12, 7] binary code does not exist because the Johnson Bound (2.13) gives
A2 (22, 7) ≤ ⌊222 /2025⌋ = 2071, implying B2 (22, 7) ≤ 211 .
It is natural to try to construct codes that meet the Griesmer Bound. We saw that the
[23, 12, 7] binary Golay code does not meet the Griesmer Bound. Neither does the [24, 12, 8]
extended binary Golay code, but both the [12, 6, 6] and [11, 6, 5] ternary Golay codes do.
(See Exercise 122.) In the next theorem, we show that the [(q r − 1)/(q − 1), r ] simplex
code meets the Griesmer Bound; we also show that all its nonzero codewords have weight
q r −1 , a fact we verified in Section 1.8 when q = 2.
Theorem 2.7.5 Every nonzero codeword of the r -dimensional simplex code over Fq has
weight q r −1 . The simplex codes meet the Griesmer Bound.
Proof: Let G be a generator matrix for the r -dimensional simplex code C over Fq . The
matrix G is formed by choosing for its columns a nonzero vector from each 1-dimensional
subspace of Frq . Because C = {xG | x ∈ Frq }, if x = 0, then wt(xG) = n − s, where s is the
number of columns y of G such that x · yT = 0. The set of vectors of Frq orthogonal to x is
an (r − 1)-dimensional subspace of Frq and thus exactly (q r −1 − 1)/(q − 1) columns y of G
satisfy x · yT = 0. Thus wt(xG) = (q r − 1)/(q − 1) − (q r −1 − 1)/(q − 1) = q r −1 proving
that each nonzero codeword has weight q r −1 .
In particular, the minimum distance is q r −1 . Since
r −1
i=0
"
q r −1
qi
#
=
r −1
i=0
q i = (q r − 1)/(q − 1),
the simplex codes meet the Griesmer Bound.
Exercise 122 Prove that the [11, 6, 5] and [12, 6, 6] ternary Golay codes meet the Griesmer
Bound, but that the [24, 12, 8] extended binary Golay code does not.
Solomon and Stiffler [319] and Belov [15] each construct a family of codes containing
the simplex codes which meet the Griesmer Bound. Helleseth [122] has shown that in
83
2.7 The Griesmer Upper Bound
many cases there are no other binary codes meeting the bound. For nonbinary fields, the
situation is much more complex. Projective geometries have also been used to construct
codes meeting the Griesmer Bound (see [114]).
In general, an [n, k, d] code may not have a basis of minimum weight vectors. However,
in the binary case, if the code meets the Griesmer Bound, it has such a generator matrix, as
the following result of van Tilborg [328] shows.
Theorem 2.7.6 Let C be an [n, k, d] binary code that meets the Griesmer Bound. Then C
has a basis of minimum weight codewords.
Proof: We proceed by induction on k. The result is clearly true if k = 1. Assume that c is
a codeword of weight d. By permuting coordinates, we may assume that C has a generator
matrix
G=
1···1
G0
0···0
,
G1
where the first row is c and G 1 is a generator matrix of the [n − d, k − 1, d ′ ] residual code
C 1 = Res(C, c); d ′ ≥ d1 = ⌈d/2⌉ by Corollary 2.7.2. As C meets the Griesmer Bound,
#
k−2 "
k−1 " #
d1
d
=
,
(2.39)
n−d =
2i
2i
i=0
i=1
k−2 ′ i
by Exercise 123. Suppose that d ′ > d1 . Then n − d < i=0
⌈d /2 ⌉ by (2.39) and C 1 violates the Griesmer Bound. Therefore d ′ = d1 and C 1 is an [n − d, k − 1, d1 ] code meeting
the Greismer Bound. By induction, we may assume the rows of G 1 have weight d1 . For
i ≥ 2, let ri = (si−1 , ti−1 ) be row i of G, where si−1 is row i − 1 of G 0 and ti−1 is row i − 1
of G 1 . By Exercise 124, one of ri or c + ri has weight d. Hence C has a basis of weight d
codewords.
Exercise 123 Prove that for i ≥ 1,
" # "
#
d
d1
=
,
2i
2i−1
where d1 = ⌈d/2⌉.
Exercise 124 In the notation of the proof of Theorem 2.7.6 show that one of ri or c + ri
has weight d.
Exercise 125 Prove that if d is even, a binary code meeting the Griesmer Bound has only
even weight codewords. Do not use Theorem 2.7.9.
This result has been generalized by Dodunekov and Manev [70]. Let
k−1 " #
d
g(k, d) =
i
2
i=0
(2.40)
be the summation in the binary Griesmer Bound. The Griesmer Bound says that for an
[n, k, d] binary code to exist, n ≥ g(k, d). So n − g(k, d) is a measure of how close the
84
Bounds on the size of codes
length of the code is to one that meets the Griesmer Bound. It also turns out to be a measure
of how much larger than minimum weight the weights of your basis vectors may need
to be.
Theorem 2.7.7 Let C be an [n, k, d] binary code with h = n − g(k, d). Then C has a basis
of codewords of weight at most d + h.
Proof: We proceed by induction on h. The case h = 0 is covered by Theorem 2.7.6. For
fixed h, proceed by induction on k. In the case k = 1, there certainly is a basis of one
codeword of weight d. When k > 1 assume that c is a codeword of weight d. By permuting
coordinates, we may assume that C has a generator matrix
G=
1···1
G0
0···0
,
G1
where the first row is c and G 1 is a generator matrix of the [n − d, k − 1, d ′ ] residual code
C 1 = Res(C, c); d ′ ≥ d1 = ⌈d/2⌉ by Corollary 2.7.2. Let d ′ = d1 + ǫ. Using Exercise 126,
k−2
g(k − 1, d ′ ) =
i=0
k−2
≥
i=0
"
"
d′
2i
d1
2i
#
#
k−2
=
i=0
k−2
+
i=0
"
d1 + ǫ
2i
$ǫ%
2i
#
k−2
= g(k − 1, d1 ) +
i=0
$ǫ%
2i
.
As part of the proof of Theorem 2.7.6, we in effect showed that g(k − 1, d1 ) = g(k −
1, ⌈d/2⌉) = g(k, d) − d (see Exercise 127). As h = n − g(k, d),
k−2 $ %
k−2 $ %
ǫ
ǫ
g(k − 1, d ′ ) ≥ g(k − 1, d1 ) +
=n−d +
−h .
(2.41)
2i
2i
i=0
i=0
Therefore as C 1 exists, n − d ≥ g(k − 1, d ′ ). Let n − d − g(k − 1, d ′ ) = h 1 ≥ 0. By (2.41),
k−2
⌊ǫ/2i ⌋. By induction, we may assume the rows of G 1 have weight at most
h 1 ≤ h − i=0
′
d + h 1 . For j ≥ 2, let r j = (s j−1 , t j−1 ) be row j of G, where s j−1 is row j − 1 of G 0 and
t j−1 is row j − 1 of G 1 . So
k−2
′
wt(t j−1 ) ≤ d + h 1 ≤ d1 + ǫ + h −
i=0
$ǫ%
2i
" #
d
≤ d1 + h =
+ h,
2
k−2
⌊ǫ/2i ⌋. By Exercise 128 one of r j or c + r j has weight between d and
since ǫ ≤ i=0
d + h. Hence C has a basis of codewords of weights at most d + h.
Exercise 126 Let x and y be nonnegative real numbers. Show that ⌈x + y⌉ ≥ ⌈x⌉ +
⌊y⌋.
Exercise 127 Show that g(k − 1, ⌈d/2⌉) = g(k, d) − d.
Exercise 128 In the notation of the proof of Theorem 2.7.7 show that one of r j or c + r j
has weight between d and d + h.
85
2.7 The Griesmer Upper Bound
Exercise 129
(a) Compute g(5, 4) from (2.40).
(b) What are the smallest weights that Theorem 2.7.7 guarantees can be used in a basis of
a [10, 5, 4] binary code?
(c) Show that a [10, 5, 4] binary code with only even weight codewords has a basis of
weight 4 codewords.
(d) Construct a [10, 5, 4] binary code with only even weight codewords.
Both Theorems 2.7.6 and 2.7.7 can be generalized in the obvious way to codes over
Fq ; see [68]. In particular, codes over Fq that meet the Griesmer Bound have a basis of
minimum weight codewords. This is not true of codes in general but is true for at least
one code with the same parameters, as the following theorem of Simonis [308] shows.
This result may prove to be useful in showing the nonexistence of linear codes with given
parameters [n, k, d].
Theorem 2.7.8 Suppose that there exists an [n, k, d] code C over Fq . Then there exists an
[n, k, d] code C ′ with a basis of codewords of weight d.
Proof: Let s be the maximum number of independent codewords {c1 , . . . , cs } in C of weight
d. Note that s ≥ 1 as C has minimum weight d. We are done if s = k. So assume s < k.
The theorem will follow by induction if we show that we can create from C an [n, k, d]
code C 1 with at least s + 1 independent codewords of weight d. Let S = span{c1 , . . . , cs }.
By the maximality of s, every vector in C \ S has weight greater than d. Let e1 be a minimum weight vector in C \ S with wt(e1 ) = d1 > d. Complete {c1 , . . . , cs , e1 } to a basis
{c1 , . . . , cs , e1 , e2 , . . . , ek−s } of C. Choose d1 − d nonzero coordinates of e1 and create e′1
to be the same as e1 except on these d1 − d coordinates, where it is 0. So wt(e′1 ) = d. Let
C 1 = span{c1 , . . . , cs , e′1 ,e2 , . . . , ek−s }. We show C 1 has minimum weight d and dimension k. The vectors in C 1 fall into two disjoint sets: S = span{c1 , . . . , cs } and C 1 \ S. The
nonzero codewords in S have weight d or more as S ⊂ C. The codewords in C 1 \ S are
obtained from those in C \ S by modifying d1 − d coordinates; therefore as C \ S has
minimum weight d1 , C 1 \ S has minimum weight at least d. So C 1 has minimum weight
d. Suppose that C 1 has dimension less than k. Then by our construction, e′1 must be
in span{c1 , . . . , cs , e2 , . . . , ek−s } ⊂ C. By maximality of s, e′1 must in fact be in S. So
e1 − e′1 ∈ C \ S as e1 ∈ S. By construction wt(e1 − e′1 ) = d1 − d; on the other hand as
e1 − e′1 ∈ C \ S, wt(e1 − e′1 ) ≥ d1 , since d1 is the minimum weight of C \ S. This contradiction shows that C 1 is an [n, k, d] code with at least s + 1 independent codewords of
weight d.
Exercise 130 Let C be the binary [9, 4] code with generator matrix
1 0 0 0 1 1 1 0 0
0 1 0 0 1 1 0 1 0
0 0 1 0 1 0 1 1 1.
0 0 0 1 0 1 1 1 1
86
Bounds on the size of codes
(a) Find the weight distribution of C and show that the minimum weight of C is 4.
(b) Apply the technique of Theorem 2.7.8 to construct a [9, 4, 4] code with a basis of weight
4 vectors.
(c) Choose any three independent weight 4 vectors in C and any weight 5 vector in C.
Modify the latter vector by changing one of its 1s to 0. Show that these four weight 4
vectors always generate a [9, 4, 4] code.
Exercise 125 shows that a binary code meeting the Griesmer Bound has only even
weight codewords if d is even. The next theorem extends this result. The binary case is due
to Dodunekov and Manev [69] and the nonbinary case is due to Ward [346].
Theorem 2.7.9 Let C be a linear code over F p , where p is a prime, which meets the
Griesmer Bound. Assume that pi | d. Then pi divides the weights of all codewords of C;
that is, pi is a divisor of C.
2.8
The Gilbert Lower Bound
We now turn to lower bounds on Aq (n, d) and Bq (n, d). The Gilbert Bound is a lower bound
on Bq (n, d) and hence a lower bound on Aq (n, d).
Theorem 2.8.1 (Gilbert Bound [98])
Bq (n, d) ≥
qn
.
d−1
n
i
(q − 1)
i
i=0
Proof: Let C be a linear code over Fq with Bq (n, d) codewords. By Theorem 2.1.7 the
covering radius of C is at most d − 1. Hence the spheres of radius d − 1 about the codewords cover Fqn . By (2.29) a sphere of radius d − 1 centered at a codeword contains
d−1 n
(q − 1)i vectors. As the Bq (n, d) spheres centered at codewords must fill
α = i=0
i
n
the space Fq , Bq (n, d)α ≥ q n giving the bound.
The Gilbert Bound can be also stated as
Bq (n, d) ≥ q n−logq
d−1
i=0
(ni)(q−1)i .
We present this formulation so that it can be compared to the Varshamov Bound given in
the next section.
The proof of Theorem 2.8.1 suggests a nonconstructive “greedy” algorithm for producing
a linear code with minimum distance at least d which meets the Gilbert Bound:
(a) Begin with any nonzero vector c1 of weight at least d.
(b) While the covering radius of the linear code C i generated by {c1 , . . . , ci } is at least d,
choose any vector ci+1 in a coset of C i of weight at least d.
No matter how this algorithm is carried out the resulting linear code has at least
q n−logq
d−1
i=0
codewords.
(ni)(q−1)i
87
2.9 The Varshamov Lower Bound
For (possibly) nonlinear codes the greedy algorithm is even easier.
(a) Start with any vector in Fqn .
(b) Continue to choose a vector whose distance is at least d to all previously chosen vectors
as long as there are such vectors.
The result is again a code with minimum distance at least d (and covering radius at most
d − 1) which meets the Gilbert Bound.
Exercise 131 Show that the two greedy constructions described after the proof of the
Gilbert Bound indeed yield codes with at least
q n−logq
d−1
i=0
(ni)(q−1)i
codewords.
Exercise 132 Show that any (n, M, d) code with covering radius d − 1 or less meets the
Gilbert Bound.
2.9
The Varshamov Lower Bound
The Varshamov Bound is similar to the Gilbert Bound, and, in fact, asymptotically they
are the same. The proof of the Varshamov Bound uses a lemma in which we show that if a
code’s parameters satisfies a certain inequality, then using a different greedy algorithm we
can attach another column to the parity check matrix and increase the length, and therefore
dimension, without decreasing the minimum distance.
Lemma 2.9.1 Let n, k, and d be integers with 2 ≤ d ≤ n and 1 ≤ k ≤ n, and let q be a
prime power. If
d−2
n−1
(q − 1)i < q n−k ,
(2.42)
i
i=0
then there exists an (n − k) × n matrix H over Fq such that every set of d − 1 columns of
H is linearly independent.
Proof: We define a greedy algorithm for finding the columns h1 , . . . , hn of H . From the
set of all q n−k column vectors of length n − k over Fq , choose:
(1) h1 to be any nonzero vector;
(2) h2 to be any vector that is not a multiple of h1 ;
..
.
(j) h j to be any vector that is not a linear combination of d − 2 (or fewer) of the vectors
h1 , . . . , h j−1 ;
..
.
(n) hn to be any vector that is not a linear combination of d − 2 (or fewer) of the vectors
h1 , . . . , hn−1 .
If we can carry out this algorithm to completion, then h1 , . . . , hn are the columns of an
(n − k) × n matrix no d − 1 of which are linearly dependent. By Corollary 1.4.14, this
88
Bounds on the size of codes
matrix is the parity check matrix for a linear code with minimum weight at least d. We show
that the construction can indeed be completed. Let j be an integer with 1 ≤ j ≤ n − 1 and
assume that vectors h1 , . . . , h j have been found. Since j ≤ n − 1, the number of different
linear combinations of d − 2 or fewer of h1 , . . . , h j is:
j
(q − 1)i ≤
i
i=0
d−2
n−1
(q − 1)i .
i
i=0
d−2
Hence if (2.42) holds, then there is some vector h j+1 which is not a linear combination of
d − 2 (or fewer) of h1 , . . . , h j . Thus the fact that h1 , h2 ,. . ., hn can be chosen follows by
induction on j.
The matrix H in Lemma 2.9.1 is the parity check matrix of a code of length n over Fq
that has dimension at least k and minimum distance at least d. Since the minimum distance
of a subcode of a code is at least the minimum distance of the code, we have the following
corollary.
Corollary 2.9.2 Let n, k, and d be integers with 2 ≤ d ≤ n and 1 ≤ k ≤ n. Then there
exists an [n, k] linear code over Fq with minimum distance at least d, provided
d−2
1+
i=0
n−1
(q − 1)i ≤ q n−k .
i
(2.43)
Theorem 2.9.3 (Varshamov Bound [337])
Bq (n, d) ≥ q
&
'
d−2 n−1
n− logq 1+ i=0
( i )(q−1)i
.
Proof: Let L be the left-hand side of (2.43). By Corollary 2.9.2, there exists an [n, k]
code over Fq with minimum weight at least d provided logq (L) ≤ n − k, or equivalently
k ≤ n − logq (L). The largest integer k satisfying this inequality is n − ⌈logq (L)⌉. Thus
Bq (n, d) ≥ q n−⌈logq (L)⌉ ,
giving the theorem.
2.10
Asymptotic bounds
In this section we will study some of the bounds from previous sections as the code lengths
go to infinity. The resulting bounds are called asymptotic bounds. Before beginning this
exploration, we need to define two terms. For a (possibly) nonlinear code over Fq with M
codewords the information rate, or simply rate, of the code is defined to be n −1 logq M.
Notice that if the code were actually an [n, k, d] linear code, it would contain M = q k
codewords and n −1 logq M = k/n; so for an [n, k, d] linear code, the ratio k/n is the rate
of the code consistent with the definition of “rate” in Section 1.11.2. In the linear case the
rate of a code is a measure of the number of information coordinates relative to the total
number of coordinates. The higher the rate, the higher the proportion of coordinates in a
89
2.10 Asymptotic bounds
codeword actually contain information rather than redundancy. If a linear or nonlinear code
of length n has minimum distance d, the ratio d/n is called the relative distance of the
code; the relative distance is a measure of the error-correcting capability of the code relative
to its length. Our asymptotic bounds will be either an upper or lower bound on the largest
possible rate for a family of (possibly nonlinear) codes over Fq of lengths going to infinity
with relative distances approaching δ. The function which determines this rate is
αq (δ) = lim sup n −1 logq Aq (n, δn).
n→∞
The exact value of αq (δ) is unknown and hence we want upper and lower bounds on this
function. An upper bound on αq (δ) would indicate that all families with relative distances
approaching δ have rates, in the limit, at most this upper bound. A lower bound on αq (δ)
indicates that there exists a family of codes of lengths approaching infinity and relative
distances approaching δ whose rates are at least this bound. A number of upper and lower
bounds exist; we investigate six upper and one lower bound arising from the nonasymptotic bounds already presented in this chapter and in Section 1.12, beginning with the upper
bounds.
2.10.1 Asymptotic Singleton Bound
Our first asymptotic upper bound on αq (δ) is a simple consequence of the Singleton Bound;
we leave its proof as an exercise.
Theorem 2.10.1 (Asymptotic Singleton Bound) If 0 ≤ δ ≤ 1, then αq (δ) ≤ 1 − δ.
Exercise 133 Prove Theorem 2.10.1.
2.10.2 Asymptotic Plotkin Bound
The Plotkin Bound can be used to give an improved (smaller) upper bound on αq (δ) compared to the Asymptotic Singleton Bound.
Theorem 2.10.2 (Asymptotic Plotkin Bound) Let r = 1 − q −1 . Then
αq (δ) = 0
αq (δ) ≤ 1 − δ/r
if r ≤ δ ≤ 1, and
if 0 ≤ δ ≤ r.
Proof: Note that the two formulas agree when δ = r . First, assume that r < δ ≤ 1. By the
Plotkin Bound (2.1), as r n < δn, Aq (n, δn) ≤ δn/(δn − r n) implying that 0 ≤ Aq (n, δn) ≤
δ/(δ − r ), independent of n. Thus αq (δ) = 0 follows immediately.
Now assume that 0 ≤ δ ≤ r . Suppose that C is an (n, M, δn) code with M = Aq (n, δn).
We can shorten C in a manner analogous to that given in Section 1.5.3 as follows. Let
n ′ = ⌊(δn − 1)/r ⌋; n ′ < n as δ ≤ r . Fix an (n − n ′ )-tuple of elements from Fq . For at
′
least one choice of this (n − n ′ )-tuple, there is a subset of at least M/q n−n codewords of C
whose right-most n − n ′ coordinates equal this (n − n ′ )-tuple. For this subset of C, puncture
90
Bounds on the size of codes
′
the right-most n − n ′ coordinates to form the code C ′ of length n ′ with M ′ ≥ M/q n−n
codewords, noting that distinct codewords in C remain distinct in C ′ ; the minimum distance
of C ′ is at least that of C by our construction. This (n ′ , M ′ , δn) code satisfies r n ′ < δn by
our choice of n ′ . Applying the Plotkin Bound to C ′ gives
δn
M
≤ δn,
≤ M′ ≤
q n−n ′
δn − r n ′
′
as δn − r n ′ ≥ 1. Therefore Aq (n, δn) = M ≤ q n−n δn and so
′
αq (δ) ≤ lim sup n −1 logq (q n−n δn)
n→∞
logq δ
logq n
n′
+
= lim sup 1 − +
n
n
n
n→∞
δ
n′
=1− .
= lim 1 −
n→∞
n
r
This completes the proof.
Exercise 134 Draw the graphs of the inequalities given by the Asymptotic Singleton and
Asymptotic Plotkin Bounds when q = 2, where the horizontal axis is the relative distance δ
and the vertical axis is the rate R = αq (δ). Why is the Asymptotic Plotkin Bound stronger
than the Asymptotic Singleton Bound? Repeat this for the cases q = 3 and q = 4.
2.10.3 Asymptotic Hamming Bound
By the Asymptotic Plotkin Bound, when bounding αq (δ) we can assume that 0 ≤ δ < r =
1 − q −1 as otherwise αq (δ) = 0. There is an asymptotic bound derived from the Hamming
Bound (Sphere Packing Bound) that is superior to the Asymptotic Plotkin Bound on an
interval of values for δ. In order to derive this bound, we define the Hilbert entropy function
on 0 ≤ x ≤ r by
0
if x = 0,
Hq (x) =
x logq (q − 1) − x logq x − (1 − x) logq (1 − x)
if 0 < x ≤ r.
We will need to estimate factorials; this can be done with Stirling’s Formula [115, Chapter 9],
one version of which is
n n+1/2 e−n+7/8 ≤ n! ≤ n n+1/2 e−n+1 .
The results of Exercises 135 and 136 will also be needed.
Exercise 135 Let 0 < δ ≤ 1 − q −1 .
(a) Show that n logq n − ⌊δn⌋ logq ⌊δn⌋ − (n − ⌊δn⌋) logq (n − ⌊δn⌋) ≤ −(δn − 1)×
logq (δ − n1 ) + logq n − n(1 − δ) logq (1 − δ) when δn > 2. Hint: (δ − (1/n))n =
δn − 1 ≤ ⌊δn⌋ ≤ δn.
(b) Show that n logq n − ⌊δn⌋ logq ⌊δn⌋ − (n − ⌊δn⌋) logq (n − ⌊δn⌋) ≥ − logq n −
δn logq δ − (n − δn + 1) logq (1 − δ + (1/n)) when δn ≥ 1. Hint: n − ⌊δn⌋ ≤
n − (δ − (1/n))n and ⌊δn⌋ ≤ δn.
91
2.10 Asymptotic bounds
(c) Show that limn→∞ n −1 (n logq n − ⌊δn⌋ logq ⌊δn⌋ − (n − ⌊δn⌋) logq (n − ⌊δn⌋)) =
−δ logq δ − (1 − δ) logq (1 − δ).
n
Exercise 136 Let 0 < i ≤ δn where 0 < δ ≤ 1 − q −1 and q ≥ 2. Prove that i−1
×
n
i−1
i
(q − 1) < i (q − 1) .
Recall that a sphere of radius a in Fqn centered at some vector in Fqn contains
a
n
(q − 1)i
Vq (n, a) =
i
i=0
(2.44)
vectors. The following lemma gives a relationship between Vq (n, a) and the entropy
function.
Lemma 2.10.3 Let 0 < δ ≤ 1 − q −1 where q ≥ 2. Then
lim n −1 logq Vq (n, ⌊δn⌋) = Hq (δ).
n→∞
Proof: In (2.44) with a = ⌊δn⌋, the largest of the 1 + ⌊δn⌋ terms is the one with i = ⌊δn⌋
by Exercise 136. Thus
n
n
⌊δn⌋
≤ Vq (n, ⌊δn⌋) ≤ (1 + ⌊δn⌋)
(q − 1)⌊δn⌋ .
(q − 1)
⌊δn⌋
⌊δn⌋
Taking logarithms and dividing by n gives
A + n −1 ⌊δn⌋ logq (q − 1) ≤ n −1 logq Vq (n, ⌊δn⌋)
where A = n −1 logq
n
⌊δn⌋
≤ A + n −1 ⌊δn⌋ logq (q − 1) + n −1 logq (1 + ⌊δn⌋)
. Therefore,
lim n −1 logq Vq (n, ⌊δn⌋) = lim A + δ logq (q − 1)
n→∞
n→∞
as limn→∞n −1 logq (1 + ⌊δn⌋) = 0.
n
= n!/(⌊δn⌋!(n − ⌊δn⌋)!), by Stirling’s Formula,
As ⌊δn⌋
n
n n+1/2 e−n+7/8
≤
⌊δn⌋
⌊δn⌋⌊δn⌋+1/2 e−⌊δn⌋+1 (n − ⌊δn⌋)n−⌊δn⌋+1/2 e−n+⌊δn⌋+1
and
n
n n+1/2 e−n+1
≤
;
⌊δn⌋+1/2
−⌊δn⌋+7/8
⌊δn⌋
e
(n − ⌊δn⌋)n−⌊δn⌋+1/2 e−n+⌊δn⌋+7/8
⌊δn⌋
hence
Be−9/8 ≤
n
≤ Be−3/4 ,
⌊δn⌋
where
B=
n n+1/2
.
− ⌊δn⌋)n−⌊δn⌋+1/2
⌊δn⌋⌊δn⌋+1/2 (n
(2.45)
92
Bounds on the size of codes
Since
lim n −1 logq (Bek ) = lim n −1 (logq B + k logq e) = lim n −1 logq B,
n→∞
n→∞
n→∞
we conclude that
lim A = lim n −1 logq B
n→∞
n→∞
= lim n −1 [n logq n − ⌊δn⌋ logq ⌊δn⌋ − (n − ⌊δn⌋) logq (n − ⌊δn⌋)]
n→∞
+ lim n −1 (1/2)[logq n − logq ⌊δn⌋ − logq (n − ⌊δn⌋)]
n→∞
= −δ logq δ − (1 − δ) logq (1 − δ) + 0
by Exercise 135. Plugging into (2.45), we obtain
lim n −1 logq Vq (n, ⌊δn⌋) = −δ logq δ − (1 − δ) logq (1 − δ) + δ logq (q − 1),
n→∞
which is Hq (δ), proving the result.
Theorem 2.10.4 (Asymptotic Hamming Bound) Let 0 < δ ≤ 1 − q −1 , where q ≥ 2.
Then αq (δ) ≤ 1 − Hq (δ/2).
Proof: Note first that Aq (n, δn) = Aq (n, ⌈δn⌉) ≤ q n /Vq (n, ⌊(⌈δn⌉ − 1)/2⌋) by the Sphere
Packing Bound. If n ≥ N , ⌊(⌈δn⌉ − 1)/2⌋ ≥ ⌊δn/2⌋ − 1 ≥ ⌊(δ − (2/N ))n/2⌋. Thus
Aq (n, δn) ≤ q n /Vq (n, ⌊(δ − (2/N ))n/2⌋, implying that
αq (δ) = lim sup n −1 logq Aq (n, δn)
n→∞
1
2
≤ lim sup 1 − n −1 logq Vq n,
δ−
n
2
N
n→∞
2
1
δ−
= 1 − Hq
2
N
by Lemma 2.10.3. But as n goes to infinity, we may let N get as large as we please showing
that αq (δ) = 1 − Hq (δ/2) since Hq is continuous.
Exercise 137 Continuing with Exercise 134, add to the graphs drawn for q = 2, q = 3,
and q = 4 the graph of the inequality from the Asymptotic Hamming Bound. (A computer
graphing tool may be helpful.) In each case, for what values of δ is the Asymptotic Hamming Bound stronger than the Asymptotic Singleton Bound and the Asymptotic Plotkin
Bound?
2.10.4 Asymptotic Elias Bound
The Asymptotic Elias Bound surpasses the Asymptotic Singleton, Plotkin, and Hamming
Bounds. As we know αq (δ) = 0 for 1 − q −1 = r ≤ δ ≤ 1, we only examine the case 0 <
δ < r.
93
2.10 Asymptotic bounds
Theorem 2.10.5 (Asymptotic Elias Bound) Let 0 < δ < r = 1 − q −1 , where q ≥ 2.
√
Then αq (δ) ≤ 1 − Hq (r − r (r − δ) ).
√
Proof: Choose x so that 0 < x < r − r (r − δ). Then x 2 − 2r x + r δ > 0. Let w = ⌊xn⌋
and d = ⌊δn⌋. By Exercise 138, for n sufficiently large, w 2 − 2r nw + r nd > 0. As x < r ,
w < r n. Thus by the Elias Bound, for n large enough,
Aq (n, δn) = Aq (n, d) ≤
qn
r nd
·
.
w 2 − 2r nw + r nd Vq (n, w)
So
r nd
·
2
w − 2r nw + r nd
d
r
n
= n −1 logq 2
d
w
w
− 2r + r
n
n
n
n −1 logq Aq (n, δn) ≤ n −1 logq
qn
Vq (n, w)
−1
+ 1 − n Vq (n, ⌊xn⌋).
Observing that limn→∞ d/n = δ and limn→∞ w/n = x, by taking the limit of the above
and using Lemma 2.10.3,
αq (δ) ≤ 1 − Hq (x).
Since this is valid for all x with 0 < x < r −
√
1 − Hq (r − r (r − δ) ).
√
r (r − δ) and Hq (x) is continuous, αq (δ) ≤
Exercise 138 Assume x 2 − 2r x + r δ > 0, w = ⌊xn⌋, and d = ⌊δn⌋ where δ, x, and n are
positive. Show that for n large enough, w2 − 2r nw + r nd > 0. Hint: Obtain a lower bound
on (w2 − 2r nw + r nd)/n 2 by using xn − 1 < w ≤ xn and δn − 1 < d.
Exercise 139 Continuing with Exercise 137, add to the graphs drawn for q = 2, q = 3,
and q = 4 the graph of the inequality from the Asymptotic Elias Bound. (A computer
graphing tool may be helpful.) In each case, for what values of δ is the Asymptotic Elias
Bound stronger than the Asymptotic Singleton Bound, the Asymptotic Plotkin Bound, and
the Asymptotic Hamming Bound?
2.10.5 The MRRW Bounds
There are two asymptotic versions of the Linear Programming Bound that are generally
the best upper bounds on αq (δ). These bounds were discovered by McEliece, Rodemich,
Rumsey, and Welch [236]. As a result they are called the MRRW Bounds. The first of
these was originally developed for binary codes but has been generalized to codes over
any field; see Levenshtein [194, Theorem 6.19]. The second holds only for binary codes.
The First MRRW Bound, when considered only for binary codes, is a consequence of
the Second MRRW Bound as we see in Exercise 141. The proofs of these bounds, other
than what is shown in Exercise 141, is beyond the scope of this text. Again αq (δ) = 0 for
1 − q −1 = r ≤ δ ≤ 1; the MRRW Bounds apply only when 0 < δ < r .
94
Bounds on the size of codes
Theorem 2.10.6 (The First MRRW Bound) Let 0 < δ < r = 1 − q −1 . Then
*
1
αq (δ) ≤ Hq
[q − 1 − (q − 2)δ − 2 (q − 1)δ(1 − δ)] .
q
In particular if q = 2, then when 0 < δ < 1/2,
1 *
− δ(1 − δ) .
α2 (δ) ≤ H2
2
Theorem 2.10.7 (The Second MRRW Bound) Let 0 < δ < 1/2. Then
α2 (δ) ≤
min {1 + g(u 2 ) − g(u 2 + 2δu + 2δ)}
0≤u≤1−2δ
where g(x) = H2 ((1 −
√
1 − x )/2).
The Second MRRW Bound is better than the First MRRW Bound, for q = 2, when
δ < 0.272. Amazingly, if q = 2, the bounds agree when 0.273 ≤ δ ≤ 0.5. Exercise 142
shows that the Second MRRW Bound is strictly smaller than the Asymptotic Elias Bound
when q = 2.
Exercise 140 Continuing with Exercise 139, add to the graphs drawn for q = 2, q = 3, and
q = 4 the graph of the inequality from the First MRRW Bound. (A computer graphing tool
may be helpful.) In each case, for what values of δ is the First MRRW Bound stronger than
the Asymptotic Singleton Bound, the Asymptotic Plotkin Bound, the Asymptotic Hamming
Bound, and the Asymptotic Elias Bound?
Exercise 141 By the Second MRRW Bound, α2 (δ) ≤ 1 + g((1 − 2δ)2 ) − g((1 − 2δ)2 +
2δ(1 − 2δ) + 2δ). Verify that this is the First MRRW Bound when q = 2.
Exercise 142 This exercise shows that Second MRRW Bound is strictly smaller than the
Asymptotic Elias Bound when q = 2.
(a) By the Second MRRW Bound, α2 (δ) ≤ 1 + g(0) − g(2δ). Verify that this is the Asymptotic Elias Bound when q = 2.
(b) Verify that the derivative of 1 + g(u 2 ) − g(u 2 + 2δu + 2δ) is negative at u = 0.
(c) How do parts (a) and (b) show that the Second MRRW Bound is strictly smaller than
the Asymptotic Elias Bound when q = 2?
2.10.6 Asymptotic Gilbert–Varshamov Bound
We now turn to the only asymptotic lower bound we will present. This bound is the asymptotic version of both the Gilbert and the Varshamov Bounds. We will give this asymptotic
bound using the Gilbert Bound and leave as an exercise the verification that asymptotically
the Varshamov Bound gives the same result.
Theorem 2.10.8 (Asymptotic Gilbert–Varshamov Bound) If 0 < δ ≤ 1 − q −1 where
q ≥ 2, then αq (δ) ≥ 1 − Hq (δ).
95
2.11 Lexicodes
Proof: By the Gilbert Bound Aq (n, δn) = Aq (n, ⌈δn⌉) ≥ q n /Vq (n, ⌈δn⌉ − 1). Since
⌈δn⌉ − 1 ≤ ⌊δn⌋, Aq (n, δn) ≥ q n /Vq (n, ⌊δn⌋). Thus
αq (δ) = lim sup n −1 logq Aq (n, δn)
n→∞
≥ lim sup 1 − n −1 logq Vq (n, ⌊δn⌋) = 1 − Hq (δ)
n→∞
by Lemma 2.10.3.
Exercise 143 Verify that, for q ≥ 2, the asymptotic version of the Varshamov Bound
produces the lower bound αq (δ) ≥ 1 − Hq (δ) when 0 < δ ≤ 1 − q −1 .
The Asymptotic Gilbert–Varshamov Bound was discovered in 1952. This bound guarantees (theoretically) the existence of a family of codes of increasing length whose relative distances approach δ while their rates approach 1 − Hq (δ). In the next section and in
Chapter 13 we will produce specific families of codes, namely lexicodes and Goppa codes,
which meet this bound. For 30 years no one was able to produce any family that exceeded
this bound and many thought that the Asymptotic Gilbert–Varshamov Bound in fact gave the
true value of αq (δ). However, in 1982, Tsfasman, Vlăduţ, and Zink [333] demonstrated that
a certain family of codes of increasing length exceeds the Asymptotic Gilbert–Varshamov
Bound. This family of codes comes from a collection of codes called algebraic geometry codes, described in Chapter 13, that generalize Goppa codes. There is, however, a
restriction on Fq for which this construction works: q must be a square with q ≥ 49.
In particular, no family of binary codes is currently known that surpasses the Asymptotic
Gilbert–Varshamov Bound. In Figure 2.2 we give five upper bounds and one lower bound on
αq (δ), for q = 2, discussed in this section and Section 1.12; see Exercise 140. In this figure,
the actual value of α2 (δ) is 0 to the right of the dashed line. Families of binary codes meeting or exceeding the Asymptotic Gilbert–Varshamov Bound lie in the dotted region of the
figure.
Exercise 144 Continuing with Exercise 140, add to the graphs drawn for q = 3 and q = 4
the graph of the inequality from the Asymptotic Gilbert–Varshamov Bound as done in
Figure 2.2 for q = 2.
2.11
Lexicodes
It is interesting that there is a class of binary linear codes whose construction is the greedy
construction for nonlinear codes described in Section 2.8, except that the order in which
the vectors are chosen is determined ahead of time. These codes are called lexicodes [39,
57, 192], and we will show that they indeed are linear. The construction implies that the
lexicodes meet the Gilbert Bound, a fact we leave as an exercise. This implies that we can
choose a family of lexicodes of increasing lengths which meet the Asymptotic Gilbert–
Varshamov Bound.
The algorithm for constructing lexicodes of length n and minimum distance d proceeds
as follows.
96
Bounds on the size of codes
R = α2 (δ)
1.0
0.8
0.6
0.4
0.2
♣♣♣♣♣✻
♣
♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ✛
♣♣♣♣♣♣ ♣♣♣ ♣♣♣♣♣
Singleton
♣♣♣ ♣♣♣ ♣♣♣ ♣♣♣♣♣♣
♣♣ ♣♣♣♣♣ ♣♣♣♣ ♣♣♣♣♣♣
♣♣♣ ♣♣♣♣ ♣♣ ♣♣♣♣♣
♣♣♣♣♣
♣♣♣ ♣♣♣♣♣♣ ♣♣♣♣
♣♣♣♣
♣♣♣ ♣♣♣♣♣ ♣♣♣
♣♣♣ ♣♣ ♣♣♣♣♣ ♣♣♣ ✛ ♣♣♣♣♣♣♣♣♣
Plotkin
♣♣♣♣♣
♣
♣
♣♣♣ ♣ ♣♣♣♣ ♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣♣♣♣♣♣♣ ♣♣♣♣
♣
♣
♣♣♣♣♣
♣♣♣ ♣♣ ♣♣♣♣♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣♣♣♣♣♣ ♣♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣♣♣♣♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣ ♣♣ ♣♣ ♣♣♣♣♣♣♣♣♣ ✛♣♣♣
♣♣♣♣♣
Hamming
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣♣♣♣♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣
♣
♣
♣♣♣♣♣
♣♣♣ ♣♣ ♣♣ ♣♣♣♣♣♣♣ ♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣♣♣♣♣♣ ♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣♣♣♣♣♣♣ ♣♣♣♣ ♣♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣♣♣♣♣♣ ✛
♣
♣
♣♣♣♣♣
♣
♣
Elias
♣♣♣ ♣ ♣♣ ♣♣ ♣♣♣♣♣♣♣♣ ♣♣♣♣♣ ♣♣♣♣
♣♣♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣♣♣♣♣♣♣♣ ♣♣♣♣ ♣♣♣
♣
♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣ ♣♣♣ ♣♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣ ♣♣ ♣♣ ♣♣ ♣♣♣♣ ♣♣♣ ♣♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣♣ ♣♣♣♣ ♣♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣ ♣♣♣♣ ♣♣♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣ ✛
♣♣♣♣♣
MRRW II
♣♣♣ ♣♣ ♣♣ ♣♣ ♣♣ ♣♣♣♣♣ ♣♣♣♣ ♣♣♣♣♣♣♣♣
♣♣♣♣♣
♣
♣
♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣ ♣♣♣ ♣♣♣♣♣
♣♣♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣♣ ♣♣♣ ♣♣♣♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣♣ ♣♣♣♣ ♣♣♣♣♣♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣♣ ♣♣ ♣♣ ♣♣♣ ♣♣♣♣ ♣♣♣ ♣♣♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣♣ ♣♣♣ ♣♣♣ ♣♣♣♣♣♣
♣♣♣♣♣
♣
♣
♣
♣
♣♣♣ ♣ ♣ ♣ ♣ ♣ ♣♣♣ ♣♣♣ ♣♣ ♣♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣ ♣♣♣♣ ♣♣♣ ♣♣♣♣♣♣
♣♣♣♣♣
♣
♣
♣
♣
♣
♣
♣
♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣ ♣♣♣♣ ♣♣ ♣♣♣♣♣♣
♣♣♣♣♣
♣♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣♣♣♣ ♣ ♣♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣ ♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣ ♣ ♣ ♣ ♣ ♣♣♣♣ ♣♣♣ ♣♣♣
♣♣♣♣♣
♣
♣♣♣♣ ♣ ✛
♣
♣
♣
♣♣♣♣♣
♣♣♣♣ ♣♣ ♣♣ ♣♣ ♣ ♣♣♣♣♣ ♣♣♣♣♣♣ ♣♣♣♣♣
Gilbert–
Varshamov
♣♣♣♣♣
♣♣♣♣ ♣ ♣♣ ♣♣ ♣♣ ♣♣♣ ♣♣♣♣ ♣♣
♣♣♣♣♣
♣♣♣♣♣ ♣ ♣ ♣ ♣ ♣♣♣♣ ♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣♣♣ ♣ ♣ ♣ ♣♣♣♣ ♣♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣♣♣♣ ♣ ♣ ♣♣♣♣ ♣♣♣♣ ♣♣♣
♣♣♣♣♣
♣♣♣♣♣♣♣ ♣ ♣ ♣♣♣♣♣ ♣♣♣♣♣♣♣
♣♣♣♣♣
♣♣♣♣♣♣♣♣♣ ♣ ♣♣♣♣♣ ♣♣♣♣♣♣
♣♣♣♣♣
♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣
♣♣ ✲
δ
0.2
0.4
0.6
0.8
1.0
Figure 2.2 Asymptotic Bounds with q = 2.
I. Order all n-tuples in lexicographic order:
0 · · · 000
0 · · · 001
0 · · · 010
0 · · · 011
0 · · · 100
..
.
II. Construct the class L of vectors of length n as follows:
(a) Put the zero vector 0 in L.
(b) Look for the first vector x of weight d in the lexicographic ordering. Put x in L.
(c) Look for the next vector in the lexicographic ordering whose distance from each
vector in L is d or more and add this to L.
(d) Repeat (c) until there are no more vectors in the lexicographic list to look at.
97
2.11 Lexicodes
The set L is actually a linear code, called the lexicode of length n and minimum distance
d. In fact if we halt the process earlier, but at just the right spots, we have linear subcodes of
L. To prove this, we need a preliminary result due to Levenshtein [192]. If u and v are binary
vectors of length n, we say that u < v provided that u comes before v in the lexicographic
order.
Lemma 2.11.1 If u, v, and w are binary vectors of length n with u < v + w and v <
u + w, then u + v < w.
Proof: If u < v + w, then u and v + w have the same leading entries after which u has a
0 and v + w has a 1. We can represent this as follows:
u = a0 · · · ,
v + w = a1 · · · .
Similarly as v < u + w, we have
v = b0 · · · ,
u + w = b1 · · · .
However, we do not know that the length i of a and the length j of b are the same. Assume
they are different, and by symmetry, that i < j. Then we have
v = b′ x · · · ,
u + w = b′ x · · · ,
where b′ is the first i entries of b. Computing w in two ways, we obtain
w = u + (u + w) = (a + b′ )x · · · ,
w = v + (v + w) = (a + b′ )(1 + x) · · · ,
a contradiction. So a and b are the same length, giving
u + v = (a + b)0 · · · ,
w = (a + b)1 · · · ,
showing u + v < w.
Theorem 2.11.2 Label the vectors in the lexicode in the order in which they are generated
so that c0 is the zero vector.
(i) L is a linear code and the vectors c2i are a basis of L.
(ii) After c2i is chosen, the next 2i − 1 vectors generated are c1 + c2i , c2 + c2i , . . . ,
c2i −1 + c2i .
(iii) Let Li = {c0 , c1 , . . . , c2i −1 }. Then Li is an [n, i, d] linear code.
Proof: If we prove (ii), we have (i) and (iii). The proof of (ii) is by induction on i. Clearly it
is true for i = 1. Assume the first 2i vectors generated are as claimed. Then Li is linear with
basis {c1 , c2 , . . . , c2i−1 }. We show that Li+1 is linear with the next 2i vectors generated in
98
Bounds on the size of codes
order by adding c2i to each of the previously chosen vectors in Li . If not, there is an r < 2i
such that c2i +r = c2i + cr . Choose r to be the smallest such value. As d(c2i + cr , c2i + c j ) =
d(cr , c j ) ≥ d for j < r , c2i + cr was a possible vector to be chosen. Since it was not, it
must have come too late in the lexicographic order; so
c2i +r < c2i + cr .
(2.46)
As d(cr + c2i +r , c j ) = d(c2i +r , cr + c j ) ≥ d for j < 2i by linearity of Li , cr + c2i +r could
have been chosen to be in the code instead of c2i (which it cannot equal). So it must be that
c2i < cr + c2i +r .
(2.47)
If j < r , then c2i + j = c2i + c j by the assumption on r . Hence d(c2i +r + c2i , c j ) =
d(c2i +r , c2i + j ) ≥ d for j < r . So c2i +r + c2i could have been chosen to be a codeword
instead of cr . The fact that it was not implies that
cr < c2i +r + c2i .
(2.48)
But then (2.47) and (2.48) with Lemma 2.11.1 imply c2i + cr < c2i +r contradicting
(2.46).
The codes Li satisfy the inclusions L1 ⊂ L2 ⊂ · · · ⊂ Lk = L, where k is the dimension
of L. In general this dimension is not known before the construction. If i < k, the left-most
coordinates are always 0 (exactly how many is also unknown). If we puncture Li on these
zero coordinates, we actually get a lexicode of smaller length.
Exercise 145 Do the following:
(a) Construct the codes Li of length 5 and minimum distance 2.
(b) Verify that these codes are linear and the vectors are generated in the order described
by Theorem 2.11.2.
(c) Repeat (a) and (b) for length 5 and minimum distance 3.
Exercise 146 Find an ordering of F52 so that the greedy algorithm does not produce a linear
code.
Exercise 147 Prove that the covering radius of L is d − 1 or less. Also prove that the
lexicodes meet the Gilbert Bound. Hint: See Exercise 132.
The lexicode L is the largest of the Li constructed in Theorem 2.11.2. We can give a
parity check matrix for L provided d ≥ 3, which is reminiscent of the parity check matrix
constructed in the proof of the Varshamov Bound. If C is a lexicode of length n with d ≥ 3,
construct its parity check matrix H = [hn · · · h1 ] as follows (where hi is a column vector).
Regard the columns hi as binary numbers where 1 ↔ (· · · 001)T , 2 ↔ (· · · 010)T , etc. Let h1
be the column corresponding to 1 and h2 the column corresponding to 2. Once hi−1 , . . . , h1
are chosen, choose hi to be the column corresponding to the smallest number which is not
a linear combination of d − 2 or fewer of hi−1 , . . . , h1 . Note that the length of the columns
does not have to be determined ahead of time. Whenever hi corresponds to a number that
is a power of 2, the length of the columns increases and zeros are placed on the tops of the
columns h j for j < i.
99
2.11 Lexicodes
Example 2.11.3
1 1 1
H = 1 1 0
1 0 1
If n = 7 and d = 3, the parity check matrix for the lexicode L is
1 0 0 0
0 1 1 0 .
0 1 0 1
So we recognize this lexicode as the [7, 4, 3] Hamming code.
As this example illustrates, the Hamming code H3 is a lexicode. In fact, all binary
Hamming and Golay codes are lexicodes [39].
Exercise 148 Compute the parity check matrix for the lexicode of length 5 and minimum
distance 3. Check that it yields the code produced in Exercise 145(c).
Exercise 149 Compute the generator and parity check matrices for the lexicodes of length
6 and minimum distance 2, 3, 4, and 5.
3
Finite fields
For deeper analysis and construction of linear codes we need to make use of the basic theory
of finite fields. In this chapter we review that theory. We will omit many of the proofs; for
those readers interested in the proofs and other properties of finite fields, we refer you to
[18, 196, 233].
3.1
Introduction
A field is a set F together with two operations: +, called addition, and ·, called multiplication,
which satisfy the following axioms. The set F is an abelian group under + with additive
identity called zero and denoted 0; the set F∗ of all nonzero elements of F is also an
abelian group under multiplication with multiplicative identity called one and denoted 1; and
multiplication distributes over addition. We will usually omit the symbol for multiplication
and write ab for the product a · b. The field is finite if F has a finite number of elements;
the number of elements in F is called the order of F. In Section 1.1 we gave three fields
denoted F2 , F3 , and F4 of orders 2, 3, and 4, respectively. In general, we will denote a field
with q elements by Fq ; another common notation is GF(q) and read “the Galois field with
q elements.”
If p is a prime, the integers modulo p form a field, which is then denoted F p . This is
not true if p is not a prime. These are the simplest examples of finite fields. As we will see
momentarily, every finite field contains some F p as a subfield.
Exercise 150 Prove that the integers modulo n do not form a field if n is not prime.
The finiteness of Fq implies that there exists a smallest positive integer p such that
1 + · · · + 1 ( p 1s) is 0. The integer p is a prime, as verified in Exercise 151, and is called
the characteristic of Fq . If a is a positive integer, we will denote the sum of a 1s in the
field by a. Also if we wish to write the sum of a αs where α is in the field, we write this
as either aα or a · α. Notice that pα = 0 for all α ∈ Fq . The set of p distinct elements
{0, 1, 2, . . . , ( p − 1)} of Fq is isomorphic to the field F p of integers modulo p. As a field
isomorphic to F p is contained in Fq , we will simplify terminology and say that F p is a
subfield of Fq ; this subfield F p is called the prime subfield of Fq . The fact that F p is a
subfield of Fq gives us crucial information about q; specifically, by Exercise 151, the field
Fq is also a finite dimensional vector space over F p , say of dimension m. Therefore q = p m
as this is the number of vectors in a vector space of dimension m over F p .
101
3.2 Polynomials and the Euclidean Algorithm
Although it is not obvious, all finite fields with the same number of elements are isomorphic. Thus our notation Fq is not ambiguous; Fq will be any representation of a field
with q elements. As we did with the prime subfield of Fq , if we say that Fr is a subfield of
Fq , we actually mean that Fq contains a subfield with r elements. If Fq has a subfield with
r elements, that subfield is unique. Hence there is no ambiguity when we say that Fr is a
subfield of Fq . It is important to note that although all finite fields of order q are isomorphic,
one field may have many different representations. The exact form that we use for the field
may be crucial in its application to coding theory. We summarize the results we have just
given in a theorem; all but the last part are proved in Exercise 151.
Theorem 3.1.1 Let Fq be a finite field with q elements. Then:
(i) q = p m for some prime p,
(ii) Fq contains the subfield F p ,
(iii) Fq is a vector space over F p of dimension m,
(iv) pα = 0 for all α ∈ Fq , and
(v) Fq is unique up to isomorphism.
Exercise 151 Prove the following:
(a) If a and b are in a field F with ab = 0, then either a = 0 or b = 0.
(b) If F is a finite field, then the characteristic of F is a prime p and {0, 1, 2, . . . , ( p − 1)}
is a subfield of F.
(c) If F is a field of characteristic p, then pα = 0 for all α ∈ F.
(d) A finite field F of characteristic p is a finite dimensional vector space over its prime
subfield and contains p m elements, where m is the dimension of this vector space.
Exercise 152 Let Fq have characteristic p. Prove that (α + β) p = α p + β p for all α, β ∈
Fq .
Throughout this chapter we will let p denote a prime number and q = p m , where m is a
positive integer.
3.2
Polynomials and the Euclidean Algorithm
Let x be an indeterminate. The set of polynomials in x with coefficients in Fq is denoted by
Fq [x]. This set forms a commutative ring with unity under ordinary polynomial addition and
multiplication. A commutative ring with unity satisfies the same axioms as a field except the
nonzero elements do not necessarily have multiplicative inverses. In fact Fq [x] is an integral
domain as well; recall that an integral domain is a commutative ring with unity such that
the product of any two nonzero elements in the ring is also nonzero. The ring Fq [x] plays
a key role not only in the construction of finite fields but also in the construction of certain
n
families of codes. So a typical polynomial in Fq [x] is f (x) = i=0
ai x i . As usual ai is
i
the coefficient of the term ai x of degree i. The degree of a polynomial f (x) is the highest
degree of any term with a nonzero coefficient and is denoted deg f (x); the zero polynomial
does not have a degree. The coefficient of the highest degree term is called the leading
102
Finite fields
coefficient. We will usually write polynomials with terms in either increasing or decreasing
degree order; e.g. a0 + a1 x + a2 x 2 + · · · + an x n or an x n + an−1 x n−1 + · · · + a1 x + a0 .
Exercise 153 Prove that Fq [x] is a commutative ring with unity and an integral domain as
well.
Exercise 154 Multiply (x 3 + x 2 + 1)(x 3 + x + 1)(x + 1) in the ring F2 [x]. Do the same
multiplication in F3 [x].
Exercise 155 Show that the degree of the product of two polynomials is the sum of the
degrees of the polynomials.
A polynomial is monic provided its leading coefficient is 1. Let f (x) and g(x) be polynomials in Fq [x]. We say that f (x) divides g(x), denoted f (x) | g(x), if there exists a
polynomial h(x) ∈ Fq [x] such that g(x) = f (x)h(x). The polynomial f (x) is called a divisor or factor of g(x). The greatest common divisor of f (x) and g(x), at least one of which
is nonzero, is the monic polynomial in Fq [x] of largest degree dividing both f (x) and g(x).
The greatest common divisor of two polynomials is uniquely determined and is denoted by
gcd( f (x), g(x)). The polynomials f (x) and g(x) are relatively prime if gcd( f (x), g(x)) = 1.
Many properties of the ring Fq [x] are analogous to properties of the integers. One can
divide two polynomials and obtain a quotient and remainder just as one can do with integers.
In part (i) of the next theorem, we state this fact, which is usually called the Division
Algorithm.
Theorem 3.2.1 Let f (x) and g(x) be in Fq [x] with g(x) nonzero.
(i) (Division Algorithm) There exist unique polynomials h(x), r (x) ∈ Fq [x] such that
f (x) = g(x)h(x) + r (x),
where deg r (x) < deg g(x) or r (x) = 0.
(ii) If f (x) = g(x)h(x) + r (x), then gcd( f (x), g(x)) = gcd(g(x), r (x)).
We can use the Division Algorithm recursively together with part (ii) of the previous
theorem to find the gcd of two polynomials. This process is known as the Euclidean Algorithm. The Euclidean Algorithm for polynomials is analogous to the Euclidean Algorithm
for integers. We state it in the next theorem and then illustrate it with an example.
Theorem 3.2.2 (Euclidean Algorithm) Let f (x) and g(x) be polynomials in Fq [x] with
g(x) nonzero.
(i) Perform the following sequence of steps until rn (x) = 0 for some n:
f (x) = g(x)h 1 (x) + r1 (x),
g(x) = r1 (x)h 2 (x) + r2 (x),
r1 (x) = r2 (x)h 3 (x) + r3 (x),
..
.
where deg r1 (x) < deg g(x),
where deg r2 (x) < deg r1 (x),
where deg r3 (x) < deg r2 (x),
rn−3 (x) = rn−2 (x)h n−1 (x) + rn−1 (x),
rn−2 (x) = rn−1 (x)h n (x) + rn (x),
where deg rn−1 (x) < deg rn−2 (x),
where rn (x) = 0.
103
3.2 Polynomials and the Euclidean Algorithm
Then gcd( f (x), g(x)) = crn−1 (x), where c ∈ Fq is chosen so that crn−1 (x) is
monic.
(ii) There exist polynomials a(x), b(x) ∈ Fq [x] such that
a(x) f (x) + b(x)g(x) = gcd( f (x), g(x)).
The sequence of steps in (i) eventually terminates because at each stage the degree of the
remainder decreases by at least 1. By repeatedly applying Theorem 3.2.1(ii), we have that
crn−1 (x) = gcd(rn−2 (x), rn−3 (x)) = gcd(rn−3 (x), rn−4 (x)) = · · · = gcd( f (x), g(x)). This
explains why (i) produces the desired gcd.
Technically, the Euclidean Algorithm is only (i) of this theorem. However, (ii) is a natural
consequence of (i), and seems to possess no name of its own in the literature. As we use (ii) so
often, we include both in the term “Euclidean Algorithm.” Part (ii) is obtained by beginning
with the next to last equation rn−3 (x) = rn−2 (x)h n−1 (x) + rn−1 (x) in the sequence in (i)
and solving for rn−1 (x) in terms of rn−2 (x) and rn−3 (x). Using the previous equation, solve
for rn−2 (x) and substitute into the equation for rn−1 (x) to obtain rn−1 (x) as a combination
of rn−3 (x) and rn−4 (x). Continue up through the sequence until we obtain rn−1 (x) as a
combination of f (x) and g(x). We illustrate all of this in the next example.
Example 3.2.3 We compute gcd(x 5 + x 4 + x 2 + 1, x 3 + x 2 + x) in the ring F2 [x] using
the Euclidean Algorithm. Part (i) of the algorithm produces the following sequence.
x 5 + x 4 + x 2 + 1 = (x 3 + x 2 + x)(x 2 + 1) + x + 1
x 3 + x 2 + x = (x + 1)(x 2 + 1) + 1
x + 1 = 1(x + 1) + 0.
Thus 1 = gcd(x + 1, 1) = gcd(x 3 + x 2 + x, x + 1) = gcd(x 5 + x 4 + x 2 + 1, x 3 + x 2 + x).
Now we find a(x) and b(x) such that a(x)(x 5 + x 4 + x 2 + 1) + b(x)(x 3 + x 2 + x) = 1 by
reversing the above steps. We begin with the last equation in our sequence with a nonzero
remainder, which we first solve for. This yields
1 = (x 3 + x 2 + x) − (x + 1)(x 2 + 1).
(3.1)
Now x + 1 is the remainder in the first equation in our sequence; solving this for x + 1 and
plugging into (3.1) produces
1 = (x 3 + x 2 + x) − [(x 5 + x 4 + x 2 + 1) − (x 3 + x 2 + x)(x 2 + 1)](x 2 + 1)
= (x 2 + 1)(x 5 + x 4 + x 2 + 1) + x 4 (x 3 + x 2 + x).
So a(x) = x 2 + 1 and b(x) = x 4 .
Exercise 156 Find gcd(x 6 + x 5 + x 4 + x 3 + x + 1, x 5 + x 3 + x 2 + x) in F2 [x]. Find
a(x) and b(x) such that gcd(x 6 + x 5 + x 4 + x 3 + x + 1, x 5 + x 3 + x 2 + x) = a(x)(x 6 +
x 5 + x 4 + x 3 + x + 1) + b(x)(x 5 + x 3 + x 2 + x).
Exercise 157 Find gcd(x 5 − x 4 + x + 1, x 3 + x) in F3 [x]. Find a(x) and b(x) such that
gcd(x 5 − x 4 + x + 1, x 3 + x) = a(x)(x 5 − x 4 + x + 1) + b(x)(x 3 + x).
104
Finite fields
Exercise 158 Let f (x) and g(x) be polynomials in Fq [x].
(a) If k(x) is a divisor of f (x) and g(x), prove that k(x) is a divisor of a(x) f (x) + b(x)g(x)
for any a(x),b(x) ∈ Fq [x].
(b) If k(x) is a divisor of f (x) and g(x), prove that k(x) is a divisor of gcd( f (x),
g(x)).
Exercise 159 Let f (x) be a polynomial over Fq of degree n.
(a) Prove that if α ∈ Fq is a root of f (x), then x − α is a factor of f (x).
(b) Prove that f (x) has at most n roots in any field containing Fq .
3.3
Primitive elements
When working with a finite field, one needs to be able to add and multiply as simply as
possible. In Theorem 3.1.1(iii), we stated that Fq is a vector space over F p of dimension m.
So a simple way to add field elements is to write them as m-tuples over F p and add them
using ordinary vector addition, as we will see in Section 3.4. Unfortunately, multiplying
such m-tuples is far from simple. We need another way to write the field elements so that
multiplication is easy; then we need a way to connect this form of the field elements to
the m-tuple form. The following theorem will assist us with this. Recall that the set Fq∗ of
nonzero elements in Fq is a group.
Theorem 3.3.1 We have the following:
(i) The group Fq∗ is cyclic of order q − 1 under the multiplication of Fq .
(ii) If γ is a generator of this cyclic group, then
Fq = {0, 1 = γ 0 , γ , γ 2 , . . . , γ q−2 },
and γ i = 1 if and only if (q − 1) | i.
Proof: From the Fundamental Theorem of Finite Abelian Groups [130], Fq∗ is a direct
product of cyclic groups of orders m 1 , m 2 , . . . , m a , where m i | m i+1 for 1 ≤ i < a and
m 1 m 2 · · · m a = q − 1. In particular α m a = 1 for all α ∈ Fq∗ . Thus the polynomial x m a − 1
has at least q − 1 roots, a contradiction to Exercise 159 unless a = 1 and m a = q − 1. Thus
Fq∗ is cyclic giving (i). Part (ii) follows from the properties of cyclic groups.
Each generator γ of Fq∗ is called a primitive element of Fq . When the nonzero elements of
a finite field are expressed as powers of γ , the multiplication in the field is easily carried out
according to the rule γ i γ j = γ i+ j = γ s , where 0 ≤ s ≤ q − 2 and i + j ≡ s (mod q − 1).
Exercise 160 Find all primitive elements in the fields F2 , F3 , and F4 of Examples 1.1.1,
1.1.2, and 1.1.3. Pick one of the primitive elements γ of F4 and rewrite the addition and
multiplication tables of F4 using the elements {0, 1 = γ 0 , γ , γ 2 }.
Let γ be a primitive element of Fq . Then γ q−1 = 1 by definition. Hence (γ i )q−1 = 1 for
0 ≤ i ≤ q − 2 showing that the elements of Fq∗ are roots of x q−1 − 1 ∈ F p [x] and hence of
x q − x. As 0 is a root of x q − x, by Exercise 159 we now see that the elements of Fq are
precisely the roots of x q − x giving this important theorem.
105
3.3 Primitive elements
Theorem 3.3.2 The elements of Fq are precisely the roots of x q − x.
In Theorem 3.1.1(v) we claim that the field with q = p m elements is unique. Theorem 3.3.2 shows that a field with q elements is the smallest field containing F p and all
the roots of x q − x. Such a field is termed a splitting field of the polynomial x q − x over
F p , that is, the smallest extension field of F p containing all the roots of the polynomial. In
general splitting fields of a fixed polynomial over a given field are isomorphic; by carefully
defining a map between the roots of the polynomial in one field and the roots in the other,
the map can be shown to be an isomorphism of the splitting fields.
Exercise 161 Using the table for F4 in Example 1.1.3, verify that all the elements of F4
are roots of x 4 − x = 0.
In analyzing the field structure, it will be useful to know the number of primitive elements
in Fq and how to find them all once one primitive element has been found. Since Fq∗ is cyclic,
we recall a few facts about finite cyclic groups. In any finite cyclic group G of order n with
generator g, the generators of G are precisely the elements g i where gcd(i, n) = 1. We let
φ(n) be the number of integers i with 1 ≤ i ≤ n such that gcd(i, n) = 1; φ is called the
Euler totient or the Euler φ-function. So there are φ(n) generators of G. The order of an
element α ∈ G is the smallest positive integer i such that α i = 1. An element of G has
order d if and only if d | n. Furthermore g i has order d = n/ gcd(i, n), and there are φ(d)
elements of order d. When speaking of field elements α ∈ Fq∗ , the order of α is its order in
the multiplicative group Fq∗ . In particular, primitive elements of Fq are those of order q − 1.
The next theorem follows from this discussion.
Theorem 3.3.3 Let γ be a primitive element of Fq .
(i) There are φ(q − 1) primitive elements in Fq ; these are the elements γ i where
gcd(i, q − 1) = 1.
(ii) For any d where d | (q − 1), there are φ(d) elements in Fq of order d; these are the
elements γ (q−1)i/d where gcd(i, d) = 1.
An element ξ ∈ Fq is an nth root of unity provided ξ n = 1, and is a primitive nth root
of unity if in addition ξ s = 1 for 0 < s < n. A primitive element γ of Fq is therefore a
primitive (q − 1)st root of unity. It follows from Theorem 3.3.1 that the field Fq contains a
primitive nth root of unity if and only if n | (q − 1), in which case γ (q−1)/n is a primitive
nth root of unity.
Exercise 162 (a) Find a primitive element γ in the field Fq given below. (b) Then write
every nonzero element in Fq as a power of γ . (c) What is the order of each element and
which are primitive? (d) Verify that there are precisely φ(d) elements of order d for every
d | (q − 1).
(i) F5 ,
(ii) F7 ,
(iii) F13 .
106
Finite fields
Exercise 163 Let γ be a primitive element of Fq , where q is odd.
(a) Show that the equation x 2 = 1 has only two solutions, 1 and −1.
(b) Show that γ (q−1)/2 = −1.
Exercise 164 If q = 2, show that
α∈Fq
α = 0.
Exercise 165 What is the smallest field of characteristic 2 that contains a:
(a) primitive nineth root of unity?
(b) primitive 11th root of unity?
What is the smallest field of characteristic 3 that contains a:
(c) primitive seventh root of unity?
(d) primitive 11th root of unity?
3.4
Constructing finite fields
We are now ready to link the additive structure of a finite field arising from the vector
space interpretation with the multiplicative structure arising from the powers of a primitive
element and actually construct finite fields.
A nonconstant polynomial f (x) ∈ Fq [x] is irreducible over Fq provided it does not factor
into a product of two polynomials in Fq [x] of smaller degree. The irreducible polynomials
in Fq [x] are like the prime numbers in the ring of integers. For example, every integer greater
than 1 is a unique product of positive primes. The same result holds in Fq [x], making Fq [x]
a unique factorization domain.
Theorem 3.4.1 Let f (x) be a nonconstant polynomial. Then
f (x) = p1 (x)a1 p2 (x)a2 · · · pk (x)ak ,
where each pi (x) is irreducible, the pi (x)s are unique up to scalar multiplication, and the
ai s are unique.
Not only is Fq [x] a unique factorization domain, it is also a principal ideal domain. An
ideal I in a commutative ring R is a nonempty subset of the ring that is closed under
subtraction such that the product of an element in I with an element in R is always in I.
The ideal I is principal provided there is an a ∈ R such that I = {ra | r ∈ R}; this ideal
will be denoted (a). A principal ideal domain is an integral domain in which each ideal is
principal. Exercises 153 and 166 show that Fq [x] is a principal ideal domain. (The fact that
Fq [x] is a unique factorization domain actually follows from the fact that it is a principal
ideal domain.)
Exercise 166 Show using the Division Algorithm that every ideal of Fq [x] is a principal
ideal.
107
3.4 Constructing finite fields
To construct a field of characteristic p, we begin with a polynomial f (x) ∈ F p [x] which
is irreducible over F p . Suppose that f (x) has degree m. By using the Euclidean Algorithm
it can be proved that the residue class ring
F p [x]/( f (x))
is actually a field and hence a finite field Fq with q = p m elements; see Exercise 167. Every
element of the residue class ring is a coset g(x) + ( f (x)), where g(x) is uniquely determined
of degree at most m − 1. We can compress the notation by writing the coset as a vector in
Fmp with the correspondence
gm−1 x m−1 + gm−2 x m−2 + · · · + g1 x + g0 + ( f (x)) ↔ gm−1 gm−2 · · · g1 g0 .
(3.2)
This vector notation allows you to add in the field using ordinary vector addition.
Exercise 167 Let f (x) be an irreducible polynomial of degree m in F p [x]. Prove that
F p [x]/( f (x))
is a finite field with p m elements.
Example 3.4.2 The polynomial f (x) = x 3 + x + 1 is irreducible over F2 ; if it were reducible, it would have a factor of degree 1 and hence a root in F2 , which it does not. So
F8 = F2 [x]/( f (x)) and, using the correspondence (3.2), the elements of F8 are given by
Cosets
0 + ( f (x))
1 + ( f (x))
x + ( f (x))
x + 1 + ( f (x))
x 2 + ( f (x))
2
x + 1 + ( f (x))
x 2 + x + ( f (x))
x 2 + x + 1 + ( f (x))
Vectors
000
001
010
011
100
101
110
111
As an illustration of addition, adding x + ( f (x)) to x 2 + x + 1 + ( f (x)) yields x 2 + 1 +
( f (x)), which corresponds to adding 010 to 111 and obtaining 101 in F32 .
How do you multiply? To multiply g1 (x) + ( f (x)) times g2 (x) + ( f (x)), first use the
Division Algorithm to write
g1 (x)g2 (x) = f (x)h(x) + r (x),
(3.3)
where deg r (x) ≤ m − 1 or r (x) = 0. Then (g1 (x) + ( f (x)))(g2 (x) + ( f (x))) = r (x) +
( f (x)). The notation is rather cumbersome and can be simplified if we replace x by α
and let f (α) = 0; we justify this shortly. From (3.3), g1 (α)g2 (α) = r (α) and we extend our
correspondence (3.2) to
gm−1 gm−2 · · · g1 g0 ↔ gm−1 α m−1 + gm−2 α m−2 + · · · + g1 α + g0 .
(3.4)
108
Finite fields
So to multiply in Fq , we simply multiply polynomials in α in the ordinary way and use
the equation f (α) = 0 to reduce powers of α higher than m − 1 to polynomials in α of
degree less than m. Notice that the subset {0α m−1 + 0α m−2 + · · · + 0α + a0 | a0 ∈ F p } =
{a0 | a0 ∈ F p } is the prime subfield of Fq .
We continue with our example of F8 adding this new correspondence. Notice that the
group F∗8 is cyclic of order 7 and hence all nonidentity elements of F∗8 are primitive. In
particular α is primitive, and we include powers of α in our table below.
Example 3.4.3 Continuing with Example 3.4.2, using correspondence (3.4), we obtain
Vectors
000
001
010
011
100
101
110
111
Polynomials
in α
Powers
of α
0
1
α
α+1
α2
α2 + 1
α2 + α
α2 + α + 1
0
1 = α0
α
α3
α2
α6
α4
α5
The column “powers of α” is obtained by using f (α) = α 3 + α + 1 = 0, which implies that
α 3 = α + 1. So α 4 = αα 3 = α(α + 1) = α 2 + α, α 5 = αα 4 = α(α 2 + α) = α 3 + α 2 =
α 2 + α + 1, etc.
Exercise 168 In the field F8 given in Example 3.4.3, simplify
(α 2 + α 6 − α + 1)(α 3 + α)/(α 4 + α).
Hint: Use the vector form of the elements to do additions and subtractions and the powers
of α to do multiplications and divisions.
We describe this construction by saying that Fq is obtained from F p by “adjoining” a
root α of f (x) to F p . This root α is formally given by α = x + ( f (x)) in the residue class
ring F p [x]/( f (x)); therefore g(x) + ( f (x)) = g(α) and f (α) = f (x + ( f (x))) = f (x) +
( f (x)) = 0 + ( f (x)).
In Example 3.4.3, we were fortunate that α was a primitive element of F8 . In general, this
will not be the case. We say that an irreducible polynomial over F p of degree m is primitive
provided that it has a root that is a primitive element of Fq = F pm . Ideally we would like
to start with a primitive polynomial to construct our field, but that is not a requirement
(see Exercise 174). It is worth noting that the irreducible polynomial we begin with can be
multiplied by a constant to make it monic as that has no effect on the ideal generated by the
polynomial or the residue class ring.
It is not obvious, but either by using the theory of splitting fields or by counting the number
of irreducible polynomials over a finite field, one can show that irreducible polynomials of
any degree exist. In particular we have the following result.
109
3.4 Constructing finite fields
Theorem 3.4.4 For any prime p and any positive integer m, there exists a finite field, unique
up to isomorphism, with q = p m elements.
Since constructing finite fields requires irreducible polynomials, we note that tables of
irreducible and primitive polynomials over F2 can be found in [256].
Remark: In the construction of Fq by adjoining a root of an irreducible polynomial f (x)
to F p , the field F p can be replaced by any finite field Fr , where r is a power of p and f (x)
by an irreducible polynomial of degree m in Fr [x] for some positive integer m. The field
constructed contains Fr as a subfield and is of order r m .
Exercise 169
(a) Find all irreducible polynomials of degrees 1, 2, 3, and 4 over F2 .
(b) Compute the product of all irreducible polynomials of degrees 1 and 2 in F2 [x].
(c) Compute the product of all irreducible polynomials of degrees 1 and 3 in F2 [x].
(d) Compute the product of all irreducible polynomials of degrees 1, 2, and 4 in F2 [x].
(e) Make a conjecture based on the results of (b), (c), and (d).
(f) In part (a), you found two irreducible polynomials of degree 3. The roots of these
polynomials lie in F8 . Using the table in Example 3.4.3 find the roots of these two
polynomials as powers of α.
Exercise 170 Find all monic irreducible polynomials of degrees 1 and 2 over F3 . Then
compute their product in F3 [x]. Does this result confirm your conjecture of Exercise 169(e)?
If not, modify your conjecture.
Exercise 171 Find all monic irreducible polynomials of degrees 1 and 2 over F4 . Then
compute their product in F4 [x]. Does this result confirm your conjecture of Exercise 169(e)
or your modified conjecture in Exercise 170? If not, modify your conjecture again.
Exercise 172 In Exercise 169, you found an irreducible polynomial of degree 3 different
from the one used to construct F8 in Examples 3.4.2 and 3.4.3. Let β be a root of this second
polynomial and construct the field F8 by adjoining β to F2 and giving a table with each
vector in F32 associated to 0 and the powers of β.
Exercise 173 By Exercise 169, the polynomial f (x) = x 4 + x + 1 ∈ F2 [x] is irreducible
over F2 . Let α be a root of f (x).
(a) Construct the field F16 by adjoining α to F2 and giving a table with each vector in F42
associated to 0 and the powers of α.
(b) Which powers of α are primitive elements of F16 ?
(c) Find the roots of the irreducible polynomials of degrees 1, 2, and 4 from Exercise 169(a).
Exercise 174 Let f (x) = x 2 + x + 1 ∈ F5 [x].
(a) Prove that f (x) is irreducible over F5 .
(b) By part (a) F25 = F5 [x]/( f (x)). Let α be a root of f (x). Show that α is not primitive.
(c) Find a primitive element in F25 of the form aα + b, where a, b ∈ F5 .
110
Finite fields
Exercise 175 By Exercise 169, x 2 + x + 1, x 3 + x + 1, and x 4 + x + 1 are irreducible
over F2 . Is x 5 + x + 1 irreducible over F2 ?
Exercise 176 Define a function τ : Fq → Fq by τ (0) = 0 and τ (α) = α −1 for α ∈ Fq∗ .
(a) Show that τ (ab) = τ (a)τ (b) for all a, b ∈ Fq .
(b) Show that if q = 2, 3, or 4, then τ (a + b) = τ (a) + τ (b).
(c) Show that if τ (a + b) = τ (a) + τ (b) for all a, b ∈ Fq , then q = 2, 3, or 4. Hint: Let
α ∈ Fq with α + α 2 = 0. Then set a = α and b = α 2 .
3.5
Subfields
In order to understand the structure of a finite field Fq , we must find the subfields contained
in Fq .
Recall that Fq has a primitive element of order q − 1 = p m − 1. If Fs is a subfield of Fq ,
then Fs has a primitive element of order s − 1 where (s − 1) | (q − 1). Because the identity
element 1 is the same for both Fq and Fs , Fs has characteristic p implying that s = pr . So it is
necessary to have ( pr − 1) | ( p m − 1). The following lemma shows when that can happen.
Lemma 3.5.1 Let a > 1 be an integer. Then (a r − 1) | (a m − 1) if and only if r | m.
h−1 ir
a . Conversely, by the Division
Proof: If r | m, then m = r h and a m − 1 = (a r − 1) i=0
Algorithm, let m = r h + u, with 0 ≤ u < r . Then
rh
au − 1
am − 1
ua −1
=
a
+
.
ar − 1
ar − 1
ar − 1
As r | r h, by the above, (a r h − 1)/(a r − 1) is an integer; also (a u − 1)/(a r − 1) is strictly
less than 1. Thus for (a m − 1)/(a r − 1) to be an integer, (a u − 1)/(a r − 1) must be 0 and
so u = 0 yielding r | m.
The same argument used in the proof of Lemma 3.5.1 shows that (x s−1 − 1) | (x q−1 − 1)
if and only if (s − 1) | (q − 1). Thus (x s − x) | (x q − x) if and only if (s − 1) | (q − 1). So
if s = pr , (x s − x) | (x q − x) if and only if r | m by Lemma 3.5.1. So we have the following
lemma.
Lemma 3.5.2 Let s = pr and q = p m . Then (x s − x) | (x q − x) if and only if r | m.
r
In particular if r | m, all of the roots of x p − x are in Fq . Exercise 177 shows that the
r
roots of x p − x in Fq form a subfield of Fq . Since any subfield of order pr must consist of
r
the roots of x p − x in Fq , this subfield must be unique. The following theorem summarizes
this discussion and completely characterizes the subfield structure of Fq .
Theorem 3.5.3 When q = p m ,
(i) Fq has a subfield of order s = pr if and only if r | m,
(ii) the elements of the subfield Fs are exactly the elements of Fq that are roots of x s − x,
and
(iii) for each r | m there is only one subfield F pr of Fq .
111
3.6 Field automorphisms
Corollary 3.5.4 If γ is a primitive element of Fq and Fs is a subfield of Fq , then the
elements of Fs are {0, 1, α, . . . , α s−2 }, where α = γ (q−1)/(s−1) .
r
Exercise 177 If q = p m and r | m, prove that the roots of x p − x in Fq form a subfield
of Fq . Hint: See Exercise 152.
We can picture the subfield structure very nicely using a lattice as the next example
shows.
Example 3.5.5 The lattice of subfields of F224 is:
q
✦✦▲ F224
✦
q
F 28
▲
▲
▲q
▲
✦
✦ ▲ F212
✦
▲
q
F 24
▲
▲
▲q
▲
✦✦▲ F26
✦
▲
q
F 22
▲
▲
▲q F 3
▲
✦
2
▲q✦✦
F2
If F is a subfield of E, we also say that E is an extension field of F. In the lattice of
Example 3.5.5 we connect two fields F and E if F ⊂ E with no proper subfields between;
the extension field E is placed above F. From this lattice one can find the intersection of
two subfields as well as the smallest subfield containing two other subfields.
Exercise 178 Draw the lattice of subfields of F230 .
Corollary 3.5.6 The prime subfield of Fq consists of those elements α in Fq that satisfy
the equation x p = x.
Exercise 179 Prove Corollary 3.5.6.
3.6
Field automorphisms
The field automorphisms of Fq form a group under function composition. We can describe
this group completely.
Recall that an automorphism σ of Fq is a bijection σ : Fq → Fq such that σ (α + β) =
σ (α) + σ (β) and σ (αβ) = σ (α)σ (β) for all α, β ∈ Fq .1 Define σ p : Fq → Fq by
σ p (α) = α p
1
for all α ∈ Fq .
In Section 1.7, where we used field automorphisms to define equivalence, the field automorphism σ acted on
the right x → (x)σ , because the monomial maps act most naturally on the right. Here we have σ act on the
left by x → σ (x), because it is probably more natural to the reader. They are interchangeable because two field
automorphisms σ and τ commute by Theorem 3.6.1, implying that the right action (x)(σ τ ) = ((x)σ )τ agrees
with the left action (σ τ )(x) = (τ σ )(x) = τ (σ (x)).
112
Finite fields
Obviously σ p (αβ) = σ p (α)σ p (β), and σ p (α + β) = σ p (α) + σ p (β) follows from Exercise 152. As σ p has kernel {0}, σ p is an automorphism of Fq , called the Frobenius autor
morphism. Analogously, define σ pr (α) = α p .
The automorphism group of Fq , denoted Gal(Fq ), is called the Galois group of Fq . We
have the following theorem characterizing this group. Part (ii) of this theorem follows from
Corollary 3.5.6.
Theorem 3.6.1
(i) Gal(Fq ) is cyclic of order m and is generated by the Frobenius automorphism σ p .
(ii) The prime subfield of Fq is precisely the set of elements in Fq such that σ p (α) = α.
(iii) The subfield Fq of Fq t is precisely the set of elements in Fq t such that σq (α) = α.
We use σ p to denote the Frobenius automorphism of any field of characteristic p. If
E and F are fields of characteristic p with E an extension field of F, then the Frobenius
automorphism of E when restricted to F is the Frobenius automorphism of F. An element
α ∈ F is fixed by an automorphism σ of F provided σ (α) = α. Let r | m. Then σ pr generates
a cyclic subgroup of Gal(Fq ) of order m/r. By Exercises 180 and 181, the elements of Fq
fixed by this subgroup are precisely the elements of the subfield F pr . We let Gal(Fq : F pr )
denote automorphisms of Fq which fix F pr . Our discussion shows that σ pr ∈ Gal(Fq : F pr ).
The following theorem strengthens this.
Theorem 3.6.2 Gal(Fq : F pr ) is the cyclic group generated by σ pr .
Exercise 180 Let σ be an automorphism of a field F. Prove that the elements of F fixed
by σ form a subfield of F.
Exercise 181 Prove that if r | m, then the elements in Fq fixed by σ pr are exactly the
elements of the subfield F pr .
Exercise 182 Prove Theorem 3.6.2.
3.7
Cyclotomic cosets and minimal polynomials
Let E be a finite extension field of Fq . Then E is a vector space over Fq and so E = Fq t for
some positive integer t. By Theorem 3.3.2, each element α of E is a root of the polynomial
t
x q − x. Thus there is a monic polynomial Mα (x) in Fq [x] of smallest degree which has α
as a root; this polynomial is called the minimal polynomial of α over Fq . In the following
theorem we collect some elementary facts about minimal polynomials.
Theorem 3.7.1 Let Fq t be an extension field of Fq and let α be an element of Fq t with
minimal polynomial Mα (x) in Fq [x]. The following hold:
(i) Mα (x) is irreducible over Fq .
(ii) If g(x) is any polynomial in Fq [x] satisfying g(α) = 0, then Mα (x) | g(x).
113
3.7 Cyclotomic cosets and minimal polynomials
(iii) Mα (x) is unique; that is, there is only one monic polynomial in Fq [x] of smallest degree
which has α as a root.
Exercise 183 Prove Theorem 3.7.1.
If we begin with an irreducible polynomial f (x) over Fq of degree r , we can adjoin a
root of f (x) to Fq and obtain the field Fq r . Amazingly, all the roots of f (x) lie in Fq r .
Theorem 3.7.2 Let f (x) be a monic irreducible polynomial over Fq of degree r . Then:
(i) all the roots of f (x) lie in Fq r and in any field containing Fq along with one root of
f (x),
+
(ii) f (x) = ri=1 (x − αi ), where αi ∈ Fq r for 1 ≤ i ≤ r , and
r
(iii) f (x) | x q − x.
Proof: Let α be a root of f (x) which we adjoin to Fq to form a field Eα with q r elements. If
β is another root of f (x), not in Eα , it is a root of some irreducible factor, over Eα , of f (x).
Adjoining β to Eα forms an extension field E of Eα . However, inside E, there is a subfield Eβ
obtained by adjoining β to Fq . Eβ must have q r elements as f (x) is irreducible of degree
r over Fq . Since Eα and Eβ are subfields of E of the same size, by Theorem 3.5.3(iii),
Eα = Eβ proving that all roots of f (x) lie in Fq r . Since any field containing Fq and one
root of f (x) contains Fq r , part (i) follows. Part (ii) now follows from Exercise 159. Part (iii)
+
r
follows from part (ii) and the fact that x q − x = α∈Fqr (x − α) by Theorem 3.3.2.
In particular this theorem holds for minimal polynomials Mα (x) over Fq as such polynomials are monic irreducible.
Theorem 3.7.3 Let Fq t be an extension field of Fq and let α be an element of Fq t with
minimal polynomial Mα (x) in Fq [x]. The following hold:
t
(i) Mα (x) | x q − x.
(ii) Mα (x) has distinct roots all lying in Fq t .
(iii) The degree of Mα (x) divides t.
+
t
(iv) x q − x = α Mα (x), where α runs through some subset of Fq t which enumerates the
minimal polynomials of all elements of Fq t exactly once.
+
t
(v) x q − x = f f (x), where f runs through all monic irreducible polynomials over Fq
whose degree divides t.
t
Proof: Part (i) follows from Theorem 3.7.1(ii), since α q − α = 0 by Theorem 3.3.2. Since
t
t
the roots of x q − x are the q t elements of Fq t , x q − x has distinct roots, and so (i) and
+n
t
pi (x), where pi (x) is irTheorem 3.7.2(i) imply (ii). By Theorem 3.4.1 x q − x = i=1
t
reducible over Fq . As x q − x has distinct roots, the factors pi (x) are distinct. By scaling
t
them, we may assume that each is monic as x q − x is monic. So pi (x) = Mα (x) for any
α ∈ Fq t with pi (α) = 0. Thus (iv) holds. But if Mα (x) has degree r , adjoining α to Fq gives
the subfield Fq r = F pmr of Fq t = F pmt implying mr | mt by Theorem 3.5.3 and hence (iii).
Part (v) follows from (iv) if we show that every monic irreducible polynomial over Fq of
t
r
degree r dividing t is a factor of x q − x. But f (x) | (x q − x) by Theorem 3.7.2(iii). Since
r
t
mr | mt, (x q − x) | (x q − x) by Lemma 3.5.2.
114
Finite fields
Two elements of Fq t which have the same minimal polynomial in Fq [x] are called
conjugate over Fq . It will be important to find all the conjugates of α ∈ Fq , that is, all the
roots of Mα (x). We know by Theorem 3.7.3(ii) that the roots of Mα (x) are distinct and lie
in Fq t . We can find these roots with the assistance of the following theorem.
Theorem 3.7.4 Let f (x) be a polynomial in Fq [x] and let α be a root of f (x) in some
extension field Fq t . Then:
(i) f (x q ) = f (x)q , and
(ii) α q is also a root of f (x) in Fq .
n
ai x i . Since q = p m , where p is the characteristic of Fq , f (x)q =
Proof: Let f (x) = i=0
n
q iq
q
by applying Exercise 152 repeatedly. However, ai = ai , because ai ∈ Fq and
i=0 ai x
q
elements of Fq are roots of x − x by Theorem 3.3.2. Thus (i) holds. In particular, f (α q ) =
f (α)q = 0, implying (ii).
2
Repeatedly applying this theorem we see that α, α q , α q , . . . are all roots of Mα (x).
r
Where does this sequence stop? It will stop after r terms, where α q = α. Suppose now
r
s
that γ is a primitive element of Fq t . Then α = γ for some s. Hence α q = α if and only
r
if γ sq −s = 1. By Theorem 3.3.1(ii), sq r ≡ s (mod q t − 1). Based on this, we define the
q-cyclotomic coset of s modulo q t − 1 to be the set
Cs = {s, sq, . . . , sq r −1 } (mod q t − 1),
where r is the smallest positive integer such that sq r ≡ s (mod q t − 1). The sets Cs partition
the set {0, 1, 2, . . . , q t − 2} of integers into disjoint sets. When listing the cyclotomic cosets,
it is usual to list Cs only once, where s is the smallest element of the coset.
Example 3.7.5 The 2-cyclotomic cosets modulo 15 are C0 = {0}, C1 = {1, 2, 4, 8}, C3 =
{3, 6, 12, 9}, C5 = {5, 10}, and C7 = {7, 14, 13, 11}.
Exercise 184 Compute the 2-cyclotomic cosets modulo:
(a) 7,
(b) 31,
(c) 63.
Exercise 185 Compute the 3-cyclotomic cosets modulo:
(a) 8,
(b) 26.
Exercise 186 Compute the 4-cyclotomic cosets modulo:
(a) 15,
(b) 63.
Compare your answers to those of Example 3.7.5 and Exercise 184.
We now know that the roots of Mα (x) = Mγ s (x) include {γ i | i ∈ Cs }. In fact these are
all of the roots. So if we know the size of Cs , we know the degree of Mγ s (x), as these are
the same.
115
3.7 Cyclotomic cosets and minimal polynomials
Theorem 3.7.6 If γ is a primitive element of Fq t , then the minimal polynomial of γ s over
Fq is
Mγ s (x) =
(x − γ i ).
i∈Cs
+
Proof: This theorem is claiming that, when expanded, f (x) = i∈Cs (x − γ i ) = j f j x j
+
is a polynomial in Fq [x], not merely Fq t [x]. Let g(x) = f (x)q . Then g(x) = i∈Cs (x q −
γ qi ) by Exercise 152. As Cs is a q-cyclotomic coset, qi runs through Cs as i does. Thus
q
g(x) = f (x q ) = j f j x q j . But g(x) = ( j f j x j )q = j f j x q j , again by Exercise 152.
q
Equating coefficients, we have f j = f j and hence by Theorem 3.6.1(iii), f (x) ∈ Fq [x].
Exercise 187 Prove that the size r of a q-cyclotomic coset modulo q t − 1 satisfies r | t.
Example 3.7.7 The field F8 was constructed in Examples 3.4.2 and 3.4.3. In the table
below we give the minimal polynomial over F2 of each element of F8 and the associated
2-cyclotomic coset modulo 7.
Roots
Minimal polynomial
2-cyclotomic coset
0
1
α, α 2 , α 4
α3, α5, α6
x
x +1
x3 + x + 1
x3 + x2 + 1
{0}
{1, 2, 4}
{3, 5, 6}
Notice that x 8 − x = x(x + 1)(x 3 + x + 1)(x 3 + x 2 + 1) is the factorization of x 8 − x into
irreducible polynomials in F2 [x] consistent with Theorem 3.7.3(iv). The polynomials x 3 +
x + 1 and x 3 + x 2 + 1 are primitive polynomials of degree 3.
Example 3.7.8 In Exercise 173, you are asked to construct F16 using the irreducible polynomial x 4 + x + 1 over F2 . With α as a root of this polynomial, we give the minimal
polynomial over F2 of each element of F16 and the associated 2-cyclotomic coset modulo
15 in the table below.
Roots
Minimal polynomial
2-cyclotomic coset
0
1
α, α 2 , α 4 , α 8
α 3 , α 6 , α 9 , α 12
α 5 , α 10
7
α , α 11 , α 13 ,α 14
x
x +1
x4 + x + 1
x4 + x3 + x2 + x + 1
x2 + x + 1
x4 + x3 + 1
{0}
{1, 2, 4, 8}
{3, 6, 9, 12}
{5, 10}
{7, 11, 13, 14}
The factorization of x 15 − 1 into irreducible polynomials in F2 [x] is
(x + 1)(x 4 + x + 1)(x 4 + x 3 + x 2 + x + 1)(x 2 + x + 1)(x 4 + x 3 + 1),
again consistent with Theorem 3.7.3(iv).
116
Finite fields
Exercise 188 Referring to Example 3.7.8:
(a) verify that the table is correct,
(b) find the elements of F16 that make up the subfields F2 and F4 , and
(c) find which irreducible polynomials of degree 4 are primitive.
Exercise 189 The irreducible polynomials of degree 2 over F3 were found in Exercise 170.
(a) Which of the irreducible polynomials of degree 2 are primitive?
(b) Let α be a root of one of these primitive polynomials. Construct the field F9 by adjoining
α to F3 and giving a table with each vector in F23 associated to 0 and the powers of α.
(c) In a table as in Examples 3.7.7 and 3.7.8, give the minimal polynomial over F3 of each
element of F9 and the associated 3-cyclotomic coset modulo 8.
(d) Verify that the product of all the polynomials in your table is x 9 − x.
Exercise 190 Without factoring x 63 − 1, how many irreducible factors does it have over
F2 and what are their degrees? Answer the same question about x 63 − 1 over F4 . See
Exercises 184 and 186.
Exercise 191 Without factoring x 26 − 1, how many irreducible factors does it have over
F3 and what are their degrees? See Exercise 185.
Exercise 192 Let f (x) = f 0 + f 1 x + · · · + f a x a be a polynomial of degree a in Fq [x].
The reciprocal polynomial of f (x) is the polynomial
f ∗ (x) = x a f (x −1 ) = f a + f a−1 x + · · · + f 0 x a .
(We will study reciprocal polynomials further in Chapter 4.)
(a) Give the reciprocal polynomial of each of the polynomials in the table of Example 3.7.7.
(b) Give the reciprocal polynomial of each of the polynomials in the table of Example 3.7.8.
(c) What do you notice about the roots and the irreducibility of the reciprocal polynomials
that you found in parts (a) and (b)?
(d) How can you use what you have learned about reciprocal polynomials in parts (a), (b),
and (c) to help find irreducible factors of x q − x over F p where q = p m with p a prime?
3.8
Trace and subfield subcodes
Suppose that we have a code C over a field Fq t . It is possible that some of the codewords
have all their components in the subfield Fq . Can much be said about the code consisting
of such codewords? Using the trace function, a surprising amount can be said.
Let C be an [n, k] code over Fq t . The subfield subcode C|Fq of C with respect to Fq is
the set of codewords in C each of whose components is in Fq . Because C is linear over Fq t ,
C|Fq is a linear code over Fq .
We first describe how to find a parity check matrix for C|Fq beginning with a parity check
matrix H of C. Because Fq t is a vector space of dimension t over Fq , we can choose a
basis {b1 , b2 , . . . , bt } ⊂ Fq t of Fq t over Fq . Each element z ∈ Fq t can be uniquely written
117
3.8 Trace and subfield subcodes
as z = z 1 b1 + · · · + z t bt , where z i ∈ Fq for 1 ≤ i ≤ t. Associate to z the t × 1 column
from H by replacing each entry h by
h. Because H is an
vector
z = [z 1 · · · z t ]T . Create H
is a t(n − k) × n matrix over Fq . The rows of
(n − k) × n matrix with entries in Fq t , H
by deleting
may be dependent. So a parity check matrix for C|F is obtained from H
H
q
dependent rows; the details of this are left as an exercise. Denote this parity check matrix
by H |Fq .
, a parity check matrix for
Exercise 193 Prove that by deleting dependent rows from H
C|Fq is obtained.
Example 3.8.1 A parity check matrix for the [6, 3, 4] hexacode G 6 over F4 given in
Example 1.3.4 is
1
H = ω
ω
ω
1
ω
1 0 0
0 1 0 .
0 0 1
ω
ω
1
The set {1, ω} is a basis of F4 over F2 . So
0
0=
,
0
1
1
0
ω=
.
1=
,
ω=
, and
1
0
1
Thus
1
0
0
=
H
1
0
1
0
1
1
0
0
1
0
1
0
1
1
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
and
0
1
0
1
0
H |F2 = 0
1
0
0
1
1
0
0
0
1
0
1
1
1
0
0
0
0
0
0
1
0
0
0
0
0 .
0
1
So we see that G 6 |F2 is the [6, 1, 6] binary repetition code.
Exercise 194 The [6, 3, 4] code C over F4 with generator matrix G given by
1 0 0 1 1
G= 0 1 0 1 ω
0 0 1 1 ω
1
ω
ω
is the hexacode (equivalent to but not equal to the code of Example 3.8.1). Using the basis
{1, ω} of F4 over F2 , find a parity check matrix for C|F2 .
Example 3.8.2 In Examples 3.4.2 and 3.4.3, we constructed F8 by adjoining a primitive
element α to F2 where α 3 = α + 1. Consider the three codes C 1 , C 2 , and C 3 of length 7 over
118
Finite fields
F8 given by the following parity check matrices, respectively:
,
H1 = 1 α α 2 α 3 α 4 α 5 α 6 ,
H2 =
1 α
1 α2
α2
α4
α3
α6
α4
α
α5
α3
α6
, and
α5
H3 =
1 α
1 α3
α2
α6
α3
α2
α4
α5
α5
α
α6
.
α4
The code C 1 is a [7, 6, 2] MDS code, while C 2 and C 3 are [7, 5, 3] MDS codes. Choosing
{1, α, α 2 } as a basis of F8 over F2 , we obtain the following parity check matrices for C 1 |F2 ,
C 2 |F2 , and C 3 |F2 :
1 0 0 1 0 1 1
H1 |F2 = H2 |F2 = 0 1 0 1 1 1 0 and
0 0 1 0 1 1 1
1 0 0 1 0 1 1
0 1 0 1 1 1 0
0 0 1 0 1 1 1
H3 |F2 =
.
1 1 1 0 1 0 0
0 1 0 0 1 1 1
0 0 1 1 1 0 1
So C 1 |F2 = C 2 |F2 are both representations of the [7, 4, 3] binary Hamming code H3 , whereas
C 3 |F2 is the [7, 1, 7] binary repetition code.
Exercise 195 Verify all the claims in Example 3.8.2.
This example illustrates that there is no elementary relationship between the dimension
of a code and the dimension of its subfield subcode. The following theorem does exhibit a
lower bound on the dimension of a subfield subcode; its proof is left as an exercise.
Theorem 3.8.3 Let C be an [n, k] code over Fq t . Then:
(i) C|Fq is an [n, kq ] code over Fq , where kq ≥ n − t(n − k), and
(ii) if the entries of a monomial matrix M ∈ ŴAut(C) belong to Fq , then M ∈ ŴAut(C|Fq ).
Exercise 196 Prove Theorem 3.8.3.
An upper bound on the dimension of a subfield subcode is given by the following
theorem.
Theorem 3.8.4 Let C be an [n, k] code over Fq t . Then C|Fq is an [n, kq ] code over Fq ,
where kq ≤ k. If C has a basis of codewords in Fqn , then this is also a basis of C|Fq and C|Fq
is k-dimensional.
Proof: Let G be a generator matrix of C|Fq . Then G has kq independent rows; thus the rank
of G is kq when considered as a matrix with entries in Fq or in Fq t . Hence the rows of G
remain independent in Fqn t , implying all parts of the theorem.
119
3.8 Trace and subfield subcodes
Another natural way to construct a code over Fq from a code over Fq t is to use the trace
function Trt : Fq t → Fq defined by
t−1
t−1
Trt (α) =
i
i=0
αq =
σqi (α),
i=0
for all α ∈ Fq t ,
t
where σq = σ pm and σq (α) = α q . Furthermore, σqt (α) = α q = α as every element of Fq t is
t
a root of x q − x by Theorem 3.3.2. So σqt is the identity and therefore σq (Trt (α)) = Trt (α).
Thus Trt (α) is a root of x q − x and hence is in Fq as required by the definition of trace.
Exercise 197 Fill in the missing steps that show that σq (Trt (α)) = Trt (α).
Exercise 198 Using the notation of Examples 3.4.2 and 3.4.3, produce a table of values
of Tr3 (β) for all β ∈ F8 where Tr3 : F8 → F2 .
As indicated in the following lemma, the trace function is a nontrivial linear functional
on the vector space Fq t over Fq .
Lemma 3.8.5 The following properties hold for Trt : Fq t → Fq :
(i) Trt is not identically zero,
(ii) Trt (α + β) = Trt (α) + Trt (β), for all α, β ∈ Fq t , and
(iii) Trt (aα) = aTrt (α), for all α ∈ Fq t and all a ∈ Fq .
Proof: Part (i) is clear because Trt (x) is a polynomial in x of degree q t−1 and hence has
at most q t−1 roots in Fqt by Exercise 159. Parts (ii) and (iii) follow from the facts that
σq (α + β) = σq (α) + σq (β) and σq (aα) = aσq (α) for all α, β ∈ Fq m and all a ∈ Fq since
σq is a field automorphism that fixes Fq .
The trace of a vector c = (c1 , c2 , . . . , cn ) ∈ Fqn t is defined by Trt (c) = (Trt (c1 ),
Trt (c2 ), . . . , Trt (cn )). The trace of a linear code C of length n over Fq t is defined by
Trt (C) = {Trt (c) | c ∈ C}.
The trace of C is a linear code of length n over Fq . The following theorem of Delsarte [64]
exhibits a dual relation between subfield subcodes and trace codes.
Theorem 3.8.6 (Delsarte) Let C be a linear code of length n over Fq t . Then
(C|Fq )⊥ = Trt (C ⊥ ).
Proof: We first show that (C|Fq )⊥ ⊇ Trt (C ⊥ ). Let c = (c1 , c2 , . . . , cn ) be in C ⊥ and let
b = (b1 , . . . , bn ) be in C|Fq . Then by Lemma 3.8.5(ii) and (iii),
n
n
Trt (c) · b =
ci bi = Trt (0) = 0
Trt (ci )bi = Trt
i=1
i=1
as c ∈ C ⊥ and b ∈ C|Fq ⊆ C. Thus Trt (c) ∈ (C|Fq )⊥ .
We now show that (C|Fq )⊥ ⊆ Trt (C ⊥ ) by showing that (Trt (C ⊥ ))⊥ ⊆ C|Fq . Let a =
(a1 , . . . , an ) ∈ (Trt (C ⊥ ))⊥ . Since ai ∈ Fq for 1 ≤ i ≤ n, we need only show that a ∈ C,
120
Finite fields
and for this it suffices to show that a · b = 0 for all b ∈ C ⊥ . Let b = (b1 , . . . , bn ) be a vector
in C ⊥ . Then βb ∈ C ⊥ for all β ∈ Fq t , and
n
n
ai Trt (βbi ) = Trt β
ai bi = Trt (βx) ,
0 = a · Trt (βb) =
i=1
i=1
where x = a · b by Lemma 3.8.5(ii) and (iii). If x = 0, then we contradict (i) of Lemma 3.8.5.
Hence x = 0 and the theorem follows.
Example 3.8.7 Delsarte’s theorem can be used to compute the subfield subcodes of the
codes in Example 3.8.2. For instance, C ⊥
1 has generator matrix
H1 = [1 α
α2
α3
α4
α5
α 6 ].
i
2
6
The seven nonzero vectors of C ⊥
1 are α (1, α, α , . . . , α ) for 0 ≤ i ≤ 6. Using Exercise 198,
i
i
we have Tr3 (α ) = 1 for i ∈ {0, 3, 5, 6} and Tr3 (α ) = 0 for i ∈ {1, 2, 4}. Thus the nonzero
vectors of (C 1 |F2 )⊥ = Tr3 (C ⊥
1 ) are the seven cyclic shifts of 1001011, namely 1001011,
1100101, 1110010, 0111001, 1011100, 0101110, and 0010111. A generator matrix for
(C 1 |F2 )⊥ is obtained by taking three linearly independent cyclic shifts of 1001011. Such a
matrix is the parity check matrix of C 1 |F2 given in Example 3.8.2.
Exercise 199 Verify all the claims made in Example 3.8.7.
Exercise 200 Use Delsarte’s Theorem to find the subfield subcodes of C 2 and C 3 from
Example 3.8.2 in an analogous manner to that given in Example 3.8.7.
If C is a code of length n over Fq t which has a basis of vectors in Fqn , the minimum weight
vectors in C and C|Fq are the same up to scalar multiplication as we now see. In particular
the minimum weight vectors in the two codes have the same set of supports, where the
support of a vector c, denoted supp(c), is the set of coordinates where c is nonzero.
Theorem 3.8.8 Let C be an [n, k, d] code over Fq t . Assume that C has a basis of codewords
in Fqn . Then every vector in C of weight d is a multiple of a vector of weight d in C|Fq .
Proof: Let {b1 , . . . , bk } be a basis of C with bi ∈ Fqn for 1 ≤ i ≤ k. Then {b1 , . . . , bk } is
k
also a basis of C|Fq , by Theorem 3.8.4. Let c = c1 · · · cn = i=1
αi bi , with αi ∈ Fq t . Thus
k
Trt (αi )bi ∈ C|Fq ⊆ C. Also wt(Trt (c)) ≤ wt(c)
c ∈ C, and by Lemma 3.8.5, Trt (c) = i=1
as Trt (0) = 0. Assume now that c has weight d. Then either Trt (c) = 0 or Trt (c) has weight
d with supp(c) = supp(Trt (c)). By replacing c by αc for some α ∈ Fq t , we may assume
that Trt (c) = 0 using Lemma 3.8.5(i). Choose i so that ci = 0. As supp(c) = supp(Trt (c)),
Trt (ci ) = 0 and wt(ci Trt (c) − Trt (ci )c) < d. Thus ci Trt (c) − Trt (ci )c = 0 and c is a multiple
of Trt (c).
Y
L
F
4amCyclic codes
e
T
We now turn to the study of an extremely important class of codes known as cyclic codes.
Many families of codes including the Golay codes, the binary Hamming codes, and codes
equivalent to the Reed–Muller codes are either cyclic or extended cyclic codes. The study
of cyclic codes began with two 1957 and 1959 AFCRL reports by E. Prange. The 1961
book by W. W. Peterson [255] compiled extensive results about cyclic codes and laid the
framework for much of the present-day theory; it also stimulated much of the subsequent
research on cyclic codes. In 1972 this book was expanded and published jointly by Peterson
and E. J. Weldon [256].
In studying cyclic codes of length n, it is convenient to label the coordinate positions as
0, 1, . . . , n − 1 and think of these as the integers modulo n. A linear code C of length n over
Fq is cyclic provided that for each vector c = c0 · · · cn−2 cn−1 in C the vector cn−1 c0 · · · cn−2 ,
obtained from c by the cyclic shift of coordinates i → i + 1 (mod n), is also in C. So a
cyclic code contains all n cyclic shifts of any codeword. Hence it is convenient to think
of the coordinate positions cyclically where, once you reach n − 1, you begin again with
coordinate 0. When we speak of consecutive coordinates, we will always mean consecutive
in that cyclical sense.
When examining cyclic codes over Fq , we will most often represent the codewords in
polynomial form. There is a bijective correspondence between the vectors c = c0 c1 · · · cn−1
in Fqn and the polynomials c(x) = c0 + c1 x + · · · + cn−1 x n−1 in Fq [x] of degree at most
n − 1. We order the terms of our polynomials from smallest to largest degree. We allow
ourselves the latitude of using the vector notation c and the polynomial notation c(x)
interchangeably. Notice that if c(x) = c0 + c1 x + · · · + cn−1 x n−1 , then xc(x) = cn−1 x n +
c0 x + c1 x 2 + · · · + cn−2 x n−1 , which would represent the codeword c cyclically shifted one
to the right if x n were set equal to 1. More formally, the fact that a cyclic code C is invariant
under a cyclic shift implies that if c(x) is in C, then so is xc(x) provided we multiply modulo
x n − 1. This suggests that the proper context for studying cyclic codes is the residue class
ring
Rn = Fq [x]/(x n − 1).
Under the correspondence of vectors with polynomials as given above, cyclic codes are
ideals of Rn and ideals of Rn are cyclic codes. Thus the study of cyclic codes in Fqn is
equivalent to the study of ideals in Rn . The study of ideals in Rn hinges on factoring
x n − 1, a topic we now explore.
122
Cyclic codes
4.1
Factoring x n − 1
We are interested in finding the irreducible factors of x n − 1 over Fq . Two possibilities
arise: either x n − 1 has repeated irreducible factors or it does not. The study of cyclic
codes has primarily focused on the latter case. By Exercise 201, x n − 1 has no repeated
factors if and only if q and n are relatively prime, an assumption we make throughout this
chapter.
Exercise 201 For f (x) = a0 + a1 x + · · · + an x n ∈ Fq [x], define the formal derivative
of f (x) to be the polynomial f ′ (x) = a1 + 2a2 x + 3a3 x 2 + · · · + nan x n−1 ∈ Fq [x]. From
this definition, show that the following rules hold:
(a) ( f + g)′ (x) = f ′ (x) + g ′ (x).
(b) ( f g)′ (x) = f ′ (x)g(x) + f (x)g ′ (x).
(c) ( f (x)m )′ = m( f (x))m−1 f ′ (x) for all positive integers m.
(d) If f (x) = f 1 (x)a1 f 2 (x)a2 · · · f n (x)an , where a1 , . . . , an are positive integers and
f 1 (x), . . . , f n (x) are distinct and irreducible over Fq , then
f (x)
= f 1 (x) · · · f n (x).
gcd( f (x), f ′ (x))
(e) Show that f (x) has no repeated irreducible factors if and only if f (x) and f ′ (x) are
relatively prime.
(f) Show that x n − 1 has no repeated irreducible factors if and only if q and n are relatively
prime.
To help factor x n − 1 over Fq , it is useful to find an extension field Fq t of Fq that
contains all of its roots. In other words, Fq t must contain a primitive nth root of unity,
which occurs precisely when n | (q t − 1) by Theorem 3.3.3. Define the order ordn (q) of
q modulo n to be the smallest positive integer a such that q a ≡ 1 (mod n). Notice that if
t = ordn (q), then Fq t contains a primitive nth root of unity α, but no smaller extension field
of Fq contains such a primitive root. As the α i are distinct for 0 ≤ i < n and (α i )n = 1,
Fq t contains all the roots of x n − 1. Consequently, Fq t is called a splitting field of x n − 1
over Fq . So the irreducible factors of x n − 1 over Fq must be the product of the distinct
minimal polynomials of the nth roots of unity in Fq t . Suppose that γ is a primitive element
of Fq t . Then α = γ d is a primitive nth root of unity where d = (q t − 1)/n. The roots
2
r −1
2
r −1
of Mαs (x) are {γ ds , γ dsq , γ dsq , . . . , γ dsq } = {α s , α sq , α sq , . . . , α sq }, where r is the
r
t
smallest positive integer such that dsq ≡ ds (mod q − 1) by Theorem 3.7.6. But dsq r ≡
ds (mod q t − 1) if and only if sq r ≡ s (mod n).
This leads us to extend the notion of q-cyclotomic cosets first developed in Section 3.7.
Let s be an integer with 0 ≤ s < n. The q-cyclotomic coset of s modulo n is the set
Cs = {s, sq, . . . , sq r −1 } (mod n),
where r is the smallest positive integer such that sq r ≡ s (mod n). It follows that Cs is the
orbit of the permutation i → iq (mod n) that contains s. The distinct q-cyclotomic cosets
123
4.1 Factoring x n − 1
modulo n partition the set of integers {0, 1, 2, . . . , n − 1}. As before we normally denote a
cyclotomic coset in this partition by choosing s to be the smallest integer contained in the
cyclotomic coset. In Section 3.7 we had studied the more restricted case where n = q t − 1.
Notice that ordn (q) is the size of the q-cyclotomic coset C1 modulo n. This discussion gives
the following theorem.
Theorem 4.1.1 Let n be a positive integer relatively prime to q. Let t = ordn (q). Let α be
a primitive nth root of unity in Fq t .
(i) For each integer s with 0 ≤ s < n, the minimal polynomial of α s over Fq is
Mαs (x) =
(x − α i ),
i∈Cs
where Cs is the q-cyclotomic coset of s modulo n.
(ii) The conjugates of α s are the elements α i with i ∈ Cs .
(iii) Furthermore,
Mαs (x)
xn − 1 =
s
is the factorization of x n − 1 into irreducible factors over Fq , where s runs through a
set of representatives of the q-cyclotomic cosets modulo n.
Example 4.1.2 The 2-cyclotomic cosets modulo 9 are C0 = {0}, C1 = {1, 2, 4, 8, 7, 5},
and C3 = {3, 6}. So ord9 (2) = 6 and the primitive ninth roots of unity lie in F64 but in no
smaller extension field of F2 . The irreducible factors of x 9 − 1 over F2 have degrees 1,
6, and 2. These are the polynomials M1 (x) = x + 1, Mα (x), and Mα3 (x), where α is a
primitive ninth root of unity in F64 . The only irreducible polynomial of degree 2 over F2
is x 2 + x + 1, which must therefore be Mα3 (x). (Notice also that α 3 is a primitive third
root of unity, which must lie in the subfield F4 of F64 .) Hence the factorization of x 9 − 1 is
x 9 − 1 = (x + 1)(x 2 + x + 1)(x 6 + x 3 + 1) and Mα (x) = x 6 + x 3 + 1.
Example 4.1.3 The 3-cyclotomic cosets modulo 13 are C0 = {0}, C1 = {1, 3, 9}, C2 =
{2, 6, 5}, C4 = {4, 12, 10}, and C7 = {7, 8, 11}. So ord13 (3) = 3 and the primitive 13th
roots of unity lie in F27 but in no smaller extension field of F3 . The irreducible factors of
x 13 − 1 over F3 have degrees 1, 3, 3, 3, and 3. These are the polynomials M1 (x) = x − 1,
Mα (x), Mα2 (x), Mα4 (x), and Mα7 (x), where α is a primitive 13th root of unity in F27 .
In these examples we notice that the size of each q-cyclotomic coset is a divisor of
ordn (q). This holds in general.
Theorem 4.1.4 The size of each q-cyclotomic coset is a divisor of ordn (q). Furthermore,
the size of C1 is ordn (q).
Proof: Let t = ordn (q) and let m be the size of Cs . Then Mαs (x) has degree m where α is a
primitive nth root of unity. So m | t by Theorem 3.7.3. The fact that the size of C1 is ordn (q)
follows directly from the definitions of q-cyclotomic cosets and ordn (q) as mentioned prior
to Theorem 4.1.1.
124
Cyclic codes
Exercise 202 Let q = 2.
(a) Find the q-cyclotomic cosets modulo n where n is:
(i) 23,
(ii) 45.
(b) Find ordn (q) for the two values of n given in part (a).
(c) What are the degrees of the irreducible factors of x n − 1 over Fq for the two values of
n given in part (a)?
Exercise 203 Repeat Exercise 202 with q = 3 and n = 28 and n = 41.
Exercise 204 Factor x 13 − 1 over F3 .
We conclude this section by noting that there are efficient computer algorithms for factoring polynomials over a finite field, among them the algorithm of Berlekamp, MacWilliams,
and Sloane [18]. Many of the algebraic manipulation software packages can factor x n − 1
over a finite field for reasonably sized integers n. There are also extensive tables in [256]
listing all irreducible polynomials over F2 for n ≤ 34.
4.2
Basic theory of cyclic codes
We noted earlier that cyclic codes over Fq are precisely the ideals of
Rn = Fq [x]/(x n − 1).
Exercises 153 and 166 show that Fq [x] is a principal ideal domain. It is straightforward then
to show that the ideals of Rn are also principal, and hence cyclic codes are the principal
ideals of Rn . When writing a codeword of a cyclic code as c(x), we technically mean the
coset c(x) + (x n − 1) in Rn . However, such notation is too cumbersome, and we will write
c(x) even when working in Rn . Thus we think of the elements of Rn as the polynomials
in Fq [x] of degree less than n with multiplication being carried out modulo x n − 1. So
when working in Rn , to multiply two polynomials, we multiply them as we would in
Fq [x] and then replace any term of the form ax ni+ j , where 0 ≤ j < n, by ax j . We see
immediately that when writing a polynomial as both an element in Fq [x] and an element
in Rn , confusion can easily arise. The reader should be aware of which ring is being
considered.
As stated earlier, throughout our study of cyclic codes, we make the basic assumption that
the characteristic p of Fq does not divide the length n of the cyclic codes being considered,
or equivalently, that gcd(n, q) = 1. The primary reason for this assumption is that x n − 1 has
distinct roots in an extension field of Fq by Exercise 201, and this enables us to describe its
roots (and as we shall see, cyclic codes) by q-cyclotomic cosets modulo n. The assumption
that p does not divide n also implies that Rn is semi-simple and thus that the Wedderburn
Structure Theorems apply; we shall not invoke these structure theorems preferring rather
to derive the needed consequences for the particular case of Rn . The theory of cyclic codes
with gcd(n, q) = 1 is discussed in [49, 201], but, to date, these “repeated root” cyclic codes
do not seem to be of much interest.
125
4.2 Basic theory of cyclic codes
To distinguish the principal ideal (g(x)) of Fq [x] from that ideal in Rn , we use the
notation g(x) for the principal ideal of Rn generated by g(x). We now show that there
is a bijective correspondence between the cyclic codes in Rn and the monic polynomial
divisors of x n − 1.
Theorem 4.2.1 Let C be a nonzero cyclic code in Rn . There exists a polynomial g(x) ∈ C
with the following properties:
(i) g(x) is the unique monic polynomial of minimum degree in C,
(ii) C = g(x), and
(iii) g(x) | (x n − 1).
n−k
gi x i , where gn−k = 1. Then:
Let k = n − deg g(x), and let g(x) = i=0
(iv) the dimension of C is k and {g(x), xg(x), . . . , x k−1 g(x)} is a basis for C,
(v) every element of C is uniquely expressible as a product g(x) f (x), where f (x) = 0 or
deg f (x) < k,
(vi)
g0
0
G =
0
g(x)
↔
g1 g2 · · ·
g0 g1 · · ·
··· ··· ···
g0
xg(x)
···
gn−k
gn−k−1
···
···
0
gn−k
···
gn−k
x k−1 g(x)
is a generator matrix for C, and
(vii) if α is a primitive nth root of unity in some extension field of Fq , then
Mαs (x),
g(x) =
s
where the product is over a subset of representatives of the q-cyclotomic cosets
modulo n.
Proof: Let g(x) be a monic polynomial of minimum degree in C. Since C is nonzero,
such a polynomial exists. If c(x) ∈ C, then by the Division Algorithm in Fq [x], c(x) =
g(x)h(x) + r (x), where either r (x) = 0 or deg r (x) < deg g(x). As C is an ideal in Rn ,
r (x) ∈ C and the minimality of the degree of g(x) implies r (x) = 0. This gives (i) and (ii).
By the Division Algorithm x n − 1 = g(x)h(x) + r (x), where again r (x) = 0 or deg r (x) <
deg g(x) in Fq [x]. As x n − 1 corresponds to the zero codeword in C and C is an ideal in
Rn , r (x) ∈ C, a contradiction unless r (x) = 0, proving (iii).
Suppose that deg g(x) = n − k. By parts (ii) and (iii), if c(x) ∈ C with c(x) = 0 or
deg c(x) < n, then c(x) = g(x) f (x) in Fq [x]. If c(x) = 0, then f (x) = 0. If c(x) = 0,
deg c(x) < n implies that deg f (x) < k, by Exercise 155. Therefore
C = {g(x) f (x) | f (x) = 0 or deg f (x) < k}.
126
Cyclic codes
So C has dimension at most k and
{g(x), xg(x), . . . , x k−1 g(x)}
spans C. Since these k polynomials are of different degrees, they are independent in Fq [x].
Since they are of degree at most n − 1, they remain independent in Rn , yielding (iv) and
(v). The codewords in this basis, written as n-tuples, give G in part (vi). Part (vii) follows
from Theorem 4.1.1.
We remark that part (ii) shows that Rn is a principal ideal ring. Also parts (i)
through (vi) of Theorem 4.2.1 hold even if gcd(n, q) = 1. However, part (vii) requires
that gcd(n, q) = 1. A parity check matrix for a cyclic code will be given in Theorem
4.2.7.
Corollary 4.2.2 Let C be a nonzero cyclic code in Rn . The following are equivalent:
(i) g(x) is the monic polynomial of minimum degree in C.
(ii) C = g(x), g(x) is monic, and g(x) | (x n − 1).
Proof: That (i) implies (ii) was shown in the proof of Theorem 4.2.1. Assume (ii). Let
g1 (x) be the monic polynomial of minimum degree in C. By the proof of Theorem 4.2.1(i)
and (ii), g1 (x) | g(x) in Fq [x] and C = g1 (x). As g1 (x) ∈ C = g(x), g1 (x) ≡ g(x)a(x)
(mod x n − 1) implying g1 (x) = g(x)a(x) + (x n − 1)b(x) in Fq [x]. Since g(x) | (x n − 1),
g(x) | g(x)a(x) + (x n − 1)b(x) or g(x) | g1 (x). As both g1 (x) and g(x) are monic and divide
one another in Fq [x], they are equal.
Theorem 4.2.1 shows that there is a monic polynomial g(x) dividing x n − 1 and generating C. Corollary 4.2.2 shows that the monic polynomial dividing x n − 1 which generates C
is unique. This polynomial is called the generator polynomial of the cyclic code C. By the
corollary, this polynomial is both the monic polynomial in C of minimum degree and the
monic polynomial dividing x n − 1 which generates C. So there is a one-to-one correspondence between the nonzero cyclic codes and the divisors of x n − 1, not equal to x n − 1.
In order to have a bijective correspondence between all the cyclic codes in Rn and all the
monic divisors of x n − 1, we define the generator polynomial of the zero cyclic code {0}
to be x n − 1. (Note that x n − 1 equals 0 in Rn .) This bijective correspondence leads to the
following corollary.
Corollary 4.2.3 The number of cyclic codes in Rn equals 2m , where m is the number of
q-cyclotomic cosets modulo n. Moreover, the dimensions of cyclic codes in Rn are all
possible sums of the sizes of the q-cyclotomic cosets modulo n.
Example 4.2.4 In Example 4.1.2, we showed that, over F2 , x 9 − 1 = (1 + x)(1 + x +
x 2 )(1 + x 3 + x 6 ), and so there are eight binary cyclic codes C i of length 9 with generator
polynomial gi (x) given in the following table.
127
4.2 Basic theory of cyclic codes
i
dim
gi (x)
0
1
2
3
4
5
6
7
0
1
2
3
6
7
8
9
1 + x9
(1 + x + x 2 )(1 + x 3 + x 6 ) = 1 + x + x 2 + · · · + x 8
(1 + x)(1 + x 3 + x 6 ) = 1 + x + x 3 + x 4 + x 6 + x 7
1 + x3 + x6
(1 + x)(1 + x + x 2 ) = 1 + x 3
1 + x + x2
1+x
1
The following corollary to Theorem 4.2.1 shows the relationship between the generator
polynomials of two cyclic codes when one code is a subcode of the other. Its proof is left
as an exercise.
Corollary 4.2.5 Let C 1 and C 2 be cyclic codes over Fq with generator polynomials g1 (x)
and g2 (x), respectively. Then C 1 ⊆ C 2 if and only if g2 (x) | g1 (x).
Exercise 205 Prove Corollary 4.2.5.
Exercise 206 Find all pairs of codes C i and C j from Example 4.2.4 where C i ⊆ C j .
Exercise 207 Over F2 , (1 + x) | (x n − 1). Let C be the binary cyclic code 1 + x of length
n. Let C 1 be any binary cyclic code of length n with generator polynomial g1 (x).
(a) What is the dimension of C?
(b) Prove that C is the set of all vectors in Fn2 with even weight.
(c) If C 1 has only even weight codewords, what is the relationship between 1 + x and
g1 (x)?
(d) If C 1 has some odd weight codewords, what is the relationship between 1 + x and
g1 (x)?
Not surprisingly, the dual of a cyclic code is also cyclic, a fact whose proof we also leave
as an exercise.
Theorem 4.2.6 The dual code of a cyclic code is cyclic.
In Section 4.4, we will develop the tools to prove the following theorem about the
generator polynomial and generator matrix of the dual of a cyclic code; see Theorem 4.4.9.
The generator matrix of the dual is of course a parity check matrix of the original cyclic
code.
Theorem 4.2.7 Let C be an [n, k] cyclic code with generator polynomial g(x). Let
k
h i x i . Then the generator polynomial of C ⊥ is g ⊥ (x) =
h(x) = (x n − 1)/g(x) = i=0
k
−1
x h(x )/ h(0). Furthermore, a generator matrix for C ⊥ , and hence a parity check matrix
for C, is
128
Cyclic codes
hk
0
0
h k−1
hk
···
h k−2
h k−1
···
hk
· · · h0
· · · h1 h0
··· ··· ···
···
0
h0
.
(4.1)
Example 4.2.8 The cyclic codes Fqn and {0} are duals of one another. The repetition code
of length n over Fq is a cyclic code whose dual is the cyclic code of even-like vectors of
Fqn .
Exercise 208 Prove Theorem 4.2.6.
Exercise 209 Based on dimension only, for 0 ≤ i ≤ 7 find C i⊥ for the cyclic codes C i in
Example 4.2.4.
It is also not surprising that a subfield subcode of a cyclic code is cyclic.
Theorem 4.2.9 Let C be a cyclic code over Fq t . Then C|Fq is also cyclic.
Exercise 210 Prove Theorem 4.2.9.
Exercise 211 Verify that the three codes C 1 , C 2 , and C 3 of length 7 over F8 of Example 3.8.2
are cyclic. Verify that C i |F2 are all cyclic as well.
Cyclic codes are easier to decode than other codes because of their additional structure.
We will examine decoding algorithms for general cyclic codes in Section 4.6 and specific
families in Section 5.4. We now examine three ways to encode cyclic codes. Let C be a
cyclic code of length n over Fq with generator polynomial g(x) of degree n − k; so C has
dimension k.
The first encoding is based on the natural encoding procedure described in Section 1.11.
Let G be the generator matrix obtained from the shifts of g(x) in Theorem 4.2.1. We encode
the message m ∈ Fqk as the codeword c = mG. We leave it as Exercise 212 to show that if
m(x) and c(x) are the polynomials in Fq [x] associated to m and c, then c(x) = m(x)g(x).
However, this encoding is not systematic.
Exercise 212 Let C be a cyclic code of length n over Fq with generator polynomial g(x).
Let G be the generator matrix obtained from the shifts of g(x) in Theorem 4.2.1. Prove
that the encoding of the message m ∈ Fqk as the codeword c = mG is the same as forming
the product c(x) = m(x)g(x) in Fq [x], where m(x) and c(x) are the polynomials in Fq [x]
associated to m and c.
The second encoding procedure is systematic. The polynomial m(x) associated to the
message m is of degree at most k − 1 (or is the zero polynomial). The polynomial
x n−k m(x) has degree at most n − 1 and has its first n − k coefficients equal to 0; thus
the message is contained in the coefficients of x n−k , x n−k+1 , . . . , x n−1 . By the Division
Algorithm,
x n−k m(x) = g(x)a(x) + r (x),
where deg r (x) < n − k or r (x) = 0.
129
4.2 Basic theory of cyclic codes
Let c(x) = x n−k m(x) − r (x); as c(x) is a multiple of g(x), c(x) ∈ C. Also c(x) differs from
x n−k m(x) in the coefficients of 1, x, . . . , x n−k−1 as deg r (x) < n − k. So c(x) contains the
message m in the coefficients of the terms of degree at least n − k.
The third encoding procedure, also systematic, for C = g(x) uses the generator polynomial g ⊥ (x) of the dual code C ⊥ as given in Theorem 4.2.7. As C is an [n, k] code,
if c = (c0 , c1 , . . . , cn−1 ) ∈ C, once c0 , c1 , . . . , ck−1 are known, then the remaining components ck , . . . , cn−1 are determined from H cT = 0, where H is the parity check matrix (4.1). We can scale the rows of H so that its rows are shifts of the monic polynomial g ⊥ (x) = h ′0 + h ′1 x + · · · + h ′k−1 x k−1 + x k . To encode C, choose k information bits
c0 , c1 , . . . , ck−1 ; then
k−1
ci = −
h ′j ci−k+ j ,
(4.2)
j=0
where the computation ci is performed in the order i = k, k + 1, . . . , n − 1.
Exercise 213 Let C be a binary cyclic code of length 15 with generator polynomial g(x) =
(1 + x + x 4 )(1 + x + x 2 + x 3 + x 4 ).
(a) Encode the message m(x) = 1 + x 2 + x 5 using the first encoding procedure (the nonsystematic encoding) described in this section.
(b) Encode the message m(x) = 1 + x 2 + x 5 using the second encoding procedure (the
first systematic encoding) described in this section.
(c) Encode the message m(x) = 1 + x 2 + x 5 using the third encoding procedure (the second systematic encoding) described in this section.
Exercise 214 Show that in any cyclic code of dimension k, any set of k consecutive
coordinates forms an information set.
Each of the encoding schemes can be implemented using linear shift-registers. We
illustrate this for binary codes with the last scheme using a linear feedback shift-register.
For more details on implementations of the other encoding schemes we refer the reader
to [18, 21, 233, 256]. The main components of a linear feedback shift-register are delay
elements (also called flip-flops) and binary adders shown in Figure 4.1. The shift-register is
run by an external clock which generates a timing signal, or clock cycle, every t0 seconds,
where t0 can be very small. Generally, a delay element stores one bit (a 0 or a 1) for one clock
cycle, after which the bit is pushed out and replaced by another bit. A linear shift-register
is a series of delay elements; a bit enters at one end of the shift-register and moves to the
next delay element with each new clock cycle. A linear feedback shift-register is a linear
shift-register in which the output is fed back into the shift-register as part of the input. The
✲
✲
❄
✲ ♥✲
✻
Delay element
Binary adder
Figure 4.1 Delay element and 3-input binary adder.
130
Cyclic codes
♥
✛
✻
rB
❄
❅
♣♣ ♣
♣
✲rr ❅
A
Source
x2
✲
x
✲
ci
1
✲
ci−1
ci−2
✲ Channel
Figure 4.2 Encoder for C, where C ⊥ = 1 + x 2 + x 3 .
adder takes all its input signals and adds them in binary; this process is considered to occur
instantaneously.
Example 4.2.10 In Figure 4.2, we construct a linear feedback shift-register for encoding
the [7, 3] binary cyclic code C with generator polynomial g(x) = 1 + x + x 2 + x 4 . Then
g ⊥ (x) = 1 + x 2 + x 3 , and so the parity check equations from (4.2) are:
c0 + c2 = c3 ,
c1 + c3 = c4 ,
c2 + c4 = c5 ,
c3 + c5 = c6 .
The shift-register has three flip-flops. Suppose we input c0 = 1, c1 = 0, and c2 = 0 into
the shift-register. Initially, before the first clock cycle, the shift-register has three unknown
quantities, which we can denote by ⋆⋆⋆. The switch is set at position A for three clock
cycles, in which case the digits 1, 0, 0 from the source are moved into the shift-register
from left to right, as indicated in Table 4.1. These three bits also move to the transmission
channel. Between clock cycles 3 and 4, the switch is set to position B, which enables the
Table 4.1 Shift-register for C, where
C ⊥ = 1 + x 2 + x 3 with input 100
Clock
cycle
Switch
Source
Channel Register
1
2
3
4
5
6
7
A
A
A
B
B
B
B
1
0
0
1
0
0
1
1
1
0
1⋆⋆
01⋆
001
100
110
111
011
131
4.2 Basic theory of cyclic codes
feedback to take place. The switch remains in this position for 4 clock cycles; during this
time no further input arrives from the source. During cycle 3, the register reads 001, which
corresponds to c0 c1 c2 = 100. Then at cycle 4, c0 = 1 and c2 = 0 from the shift-register
pass through the binary adder and are added to give c3 = 1; the result both passes to the
channel and is fed back into the shift-register from the left. Note that the shift-register has
merely executed the first of the above parity check equations. The shift-register now contains
100, which corresponds to c1 c2 c3 = 001. At clock cycle 5, the shift-register performs the
binary addition c1 + c3 = 0 + 1 = 1 = c4 , which satisfies the second of the parity check
equations. Clock cycles 6 and 7 produce c5 and c6 as indicated in the table; these satisfy
the last two parity check equations. The shift-register has sent c0 c1 · · · c6 = 1001110 to
the channel. (Notice also that the codeword is the first entry of the column “Register” in
Table 4.1.) Then the switch is reset to position A, and the shift-register is ready to receive
input for the next codeword.
Exercise 215 Give the contents of the shift-register in Figure 4.2 in the form of a table
similar to Table 4.1 for the following input sequences. Also give the codeword produced.
(a) c0 c1 c2 = 011,
(b) c0 c1 c2 = 101.
Notice that we labeled the three delay elements in Figure 4.2 with x 2 , x, and 1. The
vertical wires entering the binary adder came after the delay elements x 2 and 1. These are
the nonzero terms in g ⊥ (x) of degree less than 3. In general if deg g ⊥ (x) = k, the k delay
elements are labeled x k−1 , x k−2 , . . . , 1 from left to right. The delay elements with wires to
a binary adder are precisely those with labels coming after the nonzero terms in g ⊥ (x) of
degree less than k. Examining (4.2) shows why this works.
Exercise 216 Do the following:
(a) Let C be the [7, 4] binary cyclic code with generator polynomial g(x) = 1 + x + x 3 .
Find g ⊥ (x).
(b) Draw a linear feedback shift-register to encode C.
(c) Give the contents of the shift-register from part (b) in the form of a table similar to Table 4.1 for the input sequence c0 c1 c2 c3 = 1001. Also give the codeword
produced.
Exercise 217 Do the following:
(a) Let C be the [9, 7] binary cyclic code with generator polynomial g(x) = 1 + x + x 2
shown in Example 4.2.4. Find g ⊥ (x).
(b) Draw a linear feedback shift-register to encode C.
(c) Give the contents of the shift-register from part (b) in the form of a table similar
to Table 4.1 for the input sequence c0 c1 · · · c6 = 1011011. Also give the codeword
produced.
The idea of a cyclic code has been generalized in the following way. If a code C has
the property that there exists an integer s such that the shift of a codeword by s positions
is again a codeword, C is called a quasi-cyclic code. Cyclic codes are quasi-cyclic codes
132
Cyclic codes
with s = 1. Quasi-cyclic codes with s = 2 are sometimes monomially equivalent to double
circulant codes; a double circulant code has generator matrix [I A], where A is a circulant
matrix. We note that the term “double circulant code” is sometimes used when the generator
matrix has other “cyclic-like” structures such as
0 1···1
1
,
I
..
.
B
1
where B is a circulant matrix; such a code may be called a “bordered double circulant
code.” See Section 9.8 where we examine more extensively the construction of codes using
circulant matrices.
4.3
Idempotents and multipliers
Besides the generator polynomial, there are many polynomials that can be used to generate
a cyclic code. A general result about which polynomials generate a given cyclic code will be
presented in Theorem 4.4.4. There is another very specific polynomial, called an idempotent
generator, which can be used to generate a cyclic code.
An element e of a ring satisfying e2 = e is called an idempotent. As stated earlier without proof, the ring Rn is semi-simple when gcd(n, q) = 1. Therefore it follows from the
Wedderburn Structure Theorems that each cyclic code in Rn contains a unique idempotent
which generates the ideal. This idempotent is called the generating idempotent of the cyclic
code. In the next theorem we prove this fact directly and in the process show how to determine the generating idempotent of a cyclic code. Recall that a unity in a ring is a (nonzero)
multiplicative identity in the ring, which may or may not exist; however, if it exists, it is
unique.
Example 4.3.1 The generating idempotent for the zero cyclic code {0} is 0, while that for
the cyclic code Rn is 1.
Theorem 4.3.2 Let C be a cyclic code in Rn . Then:
(i) there exists a unique idempotent e(x) ∈ C such that C = e(x), and
(ii) if e(x) is a nonzero idempotent in C, then C = e(x) if and only if e(x) is a unity
of C.
Proof: If C is the zero code, then the idempotent is the zero polynomial and (i) is clear and
(ii) does not apply.
So we assume that C is nonzero. We prove (ii) first. Suppose that e(x) is a unity in C.
Then e(x) ⊆ C as C is an ideal. If c(x) ∈ C, then c(x)e(x) = c(x) in C, implying that
e(x) = C. Conversely, suppose that e(x) is a nonzero idempotent such that C = e(x).
Then every element c(x) ∈ C can be written in the form c(x) = f (x)e(x). But c(x)e(x) =
f (x)(e(x))2 = f (x)e(x) = c(x) implying e(x) is a unity for C.
133
4.3 Idempotents and multipliers
As C is nonzero, by (ii) if e1 (x) and e2 (x) are generating idempotents, then both are
unities and e1 (x) = e2 (x)e1 (x) = e2 (x). So we only need to show that a generating idempotent exists. If g(x) is the generator polynomial for C, then g(x) | (x n − 1) by Theorem 4.2.1. Let h(x) = (x n − 1)/g(x). Then gcd(g(x), h(x)) = 1 in Fq [x] as x n − 1 has distinct roots. By the Euclidean Algorithm there exist polynomials a(x), b(x) ∈ Fq [x] so that
a(x)g(x) + b(x)h(x) = 1. Let e(x) ≡ a(x)g(x) (mod x n − 1); that is, e(x) is the coset representative of a(x)g(x) + (x n − 1) in Rn . Then in Rn , e(x)2 ≡ (a(x)g(x))(1 − b(x)h(x)) ≡
a(x)g(x) ≡ e(x) (mod x n − 1) as g(x)h(x) = x n − 1. Also if c(x) ∈ C, c(x) = f (x)g(x)
implying c(x)e(x) ≡ f (x)g(x)(1 − b(x)h(x)) ≡ f (x)g(x) ≡ c(x) (mod x n − 1); so e(x) is
a unity in C, and (i) follows from (ii).
The proof shows that one way to find the generating idempotent e(x) for a cyclic code
C from the generator polynomial g(x) is to solve 1 = a(x)g(x) + b(x)h(x) for a(x) using
the Euclidean Algorithm, where h(x) = (x n − 1)/g(x). Then reducing a(x)g(x) modulo
x n − 1 produces e(x). We can produce g(x) if we know e(x) as the following theorem
shows.
Theorem 4.3.3 Let C be a cyclic code over Fq with generating idempotent e(x). Then the
generator polynomial of C is g(x) = gcd(e(x), x n − 1) computed in Fq [x].
Proof: Let d(x) = gcd(e(x), x n − 1) in Fq [x], and let g(x) be the generator polynomial
for C. As d(x) | e(x), e(x) = d(x)k(x) implying that every element of C = e(x) is also a
multiple of d(x); thus C ⊆ d(x). By Theorem 4.2.1, in Fq [x] g(x) | (x n − 1) and g(x) |
e(x) as e(x) ∈ C. So by Exercise 158, g(x) | d(x) implying d(x) ∈ C. Thus d(x) ⊆ C,
and so C = d(x). Since d(x) is a monic divisor of x n − 1 generating C, d(x) = g(x) by
Corollary 4.2.2.
Example 4.3.4 The following table gives all the cyclic codes C i of length 7 over
F2 together with their generator polynomials gi (x) and their generating idempotents
ei (x).
i
dim
gi (x)
ei (x)
0
1
2
3
4
5
6
7
0
1
3
3
4
4
6
7
1 + x7
1 + x + x2 + · · · + x6
1 + x2 + x3 + x4
1 + x + x2 + x4
1 + x + x3
1 + x2 + x3
1+x
1
0
1 + x + x2 + · · · + x6
1 + x3 + x5 + x6
1 + x + x2 + x4
x + x2 + x4
x3 + x5 + x6
x + x2 + · · · + x6
1
The two codes of dimension 4 are [7, 4, 3] Hamming codes.
134
Cyclic codes
Example 4.3.5 The following table gives all the cyclic codes C i of length 11 over F3
together with their generator polynomials gi (x) and their generating idempotents ei (x).
i
dim
gi (x)
ei (x)
0
1
2
3
4
5
6
7
0
1
5
5
6
6
10
11
x 11 − 1
1 + x + x 2 + · · · + x 10
1 − x − x2 − x3 + x4 + x6
1 + x2 − x3 − x4 − x5 + x6
−1 + x 2 − x 3 + x 4 + x 5
−1 − x + x 2 − x 3 + x 5
−1 + x
1
0
−1 − x − x 2 − · · · − x 10
1 + x + x3 + x4 + x5 + x9
1 + x 2 + x 6 + x 7 + x 8 + x 10
−x 2 − x 6 − x 7 − x 8 − x 10
−x − x 3 − x 4 − x 5 − x 9
−1 + x + x 2 + · · · + x 10
1
The two codes of dimension 6 are [11, 6, 5] ternary Golay codes.
Notice that Theorem 1.8.1 shows that the only [7, 4, 3] binary code is the Hamming code.
In Section 10.4.1 we will show that the only [11, 6, 5] ternary code is the Golay code. By
Examples 4.3.4 and 4.3.5 these two codes have cyclic representations.
Exercise 218 Verify the entries in the table in Example 4.3.4.
Exercise 219 Verify the entries in the table in Example 4.3.5.
Exercise 220 Find the generator polynomials and generating idempotents of all cyclic
codes over F3 of length 8 and dimensions 3 and 5.
Exercise 221 Let j(x) = 1 + x + x 2 + · · · + x n−1 in Rn and j(x) = (1/n) j(x).
(a) Prove that j(x)2 = n j(x) in Rn .
(b) Prove that j(x) is an idempotent in Rn .
(c) Prove that j(x) is the generating idempotent of the repetition code of length n over Fq .
(d) Prove that if c(x) is in Rn , then c(x) j(x) = c(1) j(x) in Rn .
(e) Prove that if c(x) is in Rn , then c(x) j(x) = 0 in Rn if c(x) corresponds to an even-like
vector in Fqn and c(x) j(x) is a nonzero multiple of j(x) in Rn if c(x) corresponds to an
odd-like vector in Fqn .
The next theorem shows that, just as for the generator polynomial, the generating idempotent and its first k − 1 cyclic shifts form a basis of a cyclic code.
Theorem 4.3.6 Let C be an [n, k] cyclic code with generating idempotent e(x) =
n−1
i
i=0 ei x . Then the k × n matrix
e0
e1
e2
· · · en−2
en−1
en−1
e0
e1
· · · en−3
en−2
..
.
en−k+1
en−k+2
en−k+3
is a generator matrix for C.
· · · en−k−1
en−k
135
4.3 Idempotents and multipliers
Proof: This is equivalent to saying that {e(x), xe(x), . . . , x k−1 e(x)} is a basis of C. Therefore it suffices to show that if a(x) ∈ Fq [x] has degree less than k such that a(x)e(x) =
0, then a(x) = 0. Let g(x) be the generator polynomial for C. If a(x)e(x) = 0, then
0 = a(x)e(x)g(x) = a(x)g(x) as e(x) is the unity of C by Theorem 4.3.2, contradicting
Theorem 4.2.1(v) unless a(x) = 0.
If C 1 and C 2 are codes of length n over Fq , then C 1 + C 2 = {c1 + c2 | c1 ∈ C 1 and c2 ∈ C 2 }
is the sum of C 1 and C 2 . Both the intersection and the sum of two cyclic codes are cyclic,
and their generator polynomials and generating idempotents are determined in the next
theorem.
Theorem 4.3.7 Let C i be a cyclic code of length n over Fq with generator polynomial gi (x)
and generating idempotent ei (x) for i = 1 and 2. Then:
(i) C 1 ∩ C 2 has generator polynomial lcm(g1 (x), g2 (x)) and generating idempotent
e1 (x)e2 (x), and
(ii) C 1 + C 2 has generator polynomial gcd(g1 (x), g2 (x)) and generating idempotent
e1 (x) + e2 (x) − e1 (x)e2 (x).
Proof: We prove (ii) and leave the proof of (i) as an exercise. We also leave it as an exercise
to show that the sum of two cyclic codes is cyclic. Let g(x) = gcd(g1 (x), g2 (x)). It follows
from the Euclidean Algorithm that g(x) = g1 (x)a(x) + g2 (x)b(x) for some a(x) and b(x)
in Fq [x]. So g(x) ∈ C 1 + C 2 . Since C 1 + C 2 is cyclic, g(x) ⊆ C 1 + C 2 . On the other hand
g(x) | g1 (x), which shows that C 1 ⊆ g(x) by Corollary 4.2.5; similarly C 2 ⊆ g(x) implying C 1 + C 2 ⊆ g(x). So C 1 + C 2 = g(x). Since g(x) | (x n − 1) as g(x) | g1 (x) and
g(x) is monic, g(x) is the generator polynomial for C 1 + C 2 by Corollary 4.2.2. If c(x) =
c1 (x) + c2 (x) where ci (x) ∈ C i for i = 1 and 2, then c(x)(e1 (x) + e2 (x) − e1 (x)e2 (x)) =
c1 (x) + c1 (x)e2 (x) − c1 (x)e2 (x) + c2 (x)e1 (x) + c2 (x) − c2 (x)e1 (x) = c(x). Thus (ii) fol
lows by Theorem 4.3.2 since e1 (x) + e2 (x) − e1 (x)e2 (x) ∈ C 1 + C 2 .
Exercise 222 Prove part (i) of Theorem 4.3.7. Also prove that if e1 (x) and e2 (x) are
idempotents, so are e1 (x)e2 (x), e1 (x) + e2 (x) − e1 (x)e2 (x), and 1 − e1 (x).
Exercise 223 Show that the sum of two cyclic codes is cyclic as claimed in Theorem
4.3.7.
Exercise 224 Let C i be a cyclic code of length n over Fq for i = 1 and 2. Let α be a primitive
nth root of unity in some extension field of Fq . Suppose C i has generator polynomial gi (x),
where
gi (x) =
Mαs (x)
s∈K i
is the factorization of gi (x) into minimal polynomials over Fq with K i a subset of the
representatives of the q-cyclotomic cosets modulo n. Assume that the representative of a
coset is the smallest element in the coset. What are the subsets of representatives of qcyclotomic cosets that will produce the generator polynomials for the codes C 1 + C 2 and
C1 ∩ C2?
136
Cyclic codes
Exercise 225 Find the generator polynomials and the generating idempotents of the following codes from Example 4.3.4: C 1 + C 6 , C 2 + C 3 , C 2 + C 4 , C 2 + C 5 , C 3 + C 4 , C 3 + C 5 ,
C 1 ∩ C 6 , C 2 ∩ C 3 , C 2 ∩ C 4 , C 2 ∩ C 5 , C 3 ∩ C 4 , and C 3 ∩ C 5 .
Exercise 226 Which pairs of codes in Exercise 220 sum to the code F83 ? Which pairs of
codes in that example have intersection {0}?
Exercise 227 If C i is a cyclic code with generator polynomial gi (x) and generating idempotent ei (x) for 1 ≤ i ≤ 3, what are the generator polynomial and generating idempotent
of C 1 + C 2 + C 3 ?
We are now ready to describe a special set of idempotents, called primitive idempotents,
that, once known, will produce all the idempotents in Rn and therefore all the cyclic codes.
Let x n − 1 = f 1 (x) · · · f s (x), where f i (x) is irreducible over Fq for 1 ≤ i ≤ s. The f i (x)
are distinct as x n − 1 has distinct roots. Let
f i (x) = (x n − 1)/ f i (x). In the next theorem we
show that the ideals
f i (x) of Rn are the minimal ideals of Rn . Recall that an ideal I in a
ring R is a minimal ideal provided there is no proper ideal between {0} and I. We denote
ei (x). The idempotents
e1 (x), . . . ,
es (x) are called
the generating idempotent of
f i (x) by
the primitive idempotents of Rn .
Theorem 4.3.8 The following hold in Rn .
(i) The ideals
f i (x) for 1 ≤ i ≤ s are all the minimal ideals of Rn .
f i (x) for 1 ≤ i ≤ s.
(ii) Rn is the vector space direct sum of
e j (x) = 0 in Rn .
(iii) If i = j, then
ei (x)
s
(iv)
ei (x) = 1 in Rn .
i=1
(v) The only idempotents in
f i (x) are 0 and
ei (x).
(vi) If e(x) is a nonzero idempotent in Rn , then there is a subset T of {1, 2, . . . , s} such
f i (x).
that e(x) = i∈T
ei (x) and e(x) = i∈T
Proof: Suppose that
f i (x) is not a minimal ideal of Rn . By Corollary 4.2.5, there would
be a generator polynomial g(x) of a nonzero ideal properly contained in
f i (x) such
that
f i (x) | g(x) with g(x) =
f i (x). As f i (x) is irreducible and g(x) | (x n − 1), this is
impossible. So
f i (x) is a minimal ideal of Rn , completing part of (i).
As {
f i (x) | 1 ≤ i ≤ s} has no common irreducible factor of x n − 1 and each polynomial
f 1 (x), . . . ,
f s (x)) = 1. Applying the Euclidean Algorithm
in the set divides x n − 1, gcd(
inductively,
s
1=
i=1
ai (x)
f i (x)
(4.3)
for some ai (x) ∈ Fq [x]. So 1 is in the sum of the ideals
f i (x), which is itself an ideal
of Rn . In any ring, the only ideal containing the identity of the ring is the ring itself.
This proves that Rn is the vector space sum of the ideals
f i (x). To prove it is a di
f j (x) = {0} for 1 ≤ i ≤ s. As f i (x) |
f j (x)
rect sum, we must show that
f i (x) ∩ j=i
for j = i, f j (x) ∤
f j (x), and the irreducible factors of x n − 1 are distinct, we conclude that f i (x) = gcd{
f j (x) | 1 ≤ j ≤ s, j = i}. Applying induction to the results of
137
4.3 Idempotents and multipliers
Theorem 4.3.7(ii) shows that f i (x) = j=i
f j (x). So
f i (x) ∩ j=i
f j (x) =
f i (x), f i (x)) = x n − 1 = {0} by Theorem 4.3.7 completing (ii).
f i (x) ∩ f i (x) = lcm(
Let M = m(x) be any minimal ideal of Rn . As
s
0 = m(x) = m(x) · 1 =
i=1
m(x)ai (x)
f i (x)
by (4.3), there is an i such that m(x)ai (x)
f i (x) = 0. Hence M ∩
f i (x) =
{0} as
f i (x) ∈ M ∩
f i (x), and therefore M =
f i (x) by minimality of M and
m(x)ai (x)
f i (x). This completes the proof of (i).
e j (x) ∈
f i (x) ∩
f j (x) = {0} by (ii), yielding (iii). By using (iii) and
If i = j,
ei (x)
s
applying induction to Theorem 4.3.7(ii),
ei (x) is the generating idempotent of
i=1
s
The
generating
idempotent
of Rn is 1, verifying (iv).
f
(x)
=
R
by
part
(ii).
i
n
i=1
If e(x) is a nonzero idempotent in
f i (x), then e(x) is an ideal contained in
f i (x).
By minimality as e(x) is nonzero, f i (x) = e(x), implying by Theorem 4.3.2 that e(x) =
ei (x) as both are the unique unity of
f i (x). Thus (v) holds.
f i (x). Thus either e(x)
ei (x)
For (vi), note that e(x)
ei (x) is an idempotent in
ei (x) = 0}. Then by (iv), e(x) = e(x) · 1 =
is 0 or
ei (x) by (v). Let T = {i | e(x)
s
s
e(x) i=1
e(x)
ei (x) = i∈T
ei (x) = i=1
ei (x). Furthermore, e(x) = i∈T
ei (x) =
ei (x) by Theorem 4.3.7(ii) and induction.
i∈T
We remark that the minimal ideals in this theorem are extension fields of Fq . Theorem
4.4.19 will also characterize these minimal ideals using the trace map.
Theorem 4.3.9 Let M be a minimal ideal of Rn . Then M is an extension field of Fq .
Proof: We only need to show that every nonzero element in M has a multiplicative inverse
in M. Let a(x) ∈ M with a(x) not zero. Then a(x) is a nonzero ideal of Rn contained
in M, and hence a(x) = M. So if e(x) is the unity of M, there is an element b(x) in
Rn with a(x)b(x) = e(x). Now c(x) = b(x)e(x) ∈ M as e(x) ∈ M. Hence a(x)c(x) =
e(x)2 = e(x).
Exercise 228 What fields arise as the minimal ideals in R7 and R15 over F2 ?
Theorem 4.3.8 shows that every idempotent is a sum of primitive idempotents and that
cyclic codes are sums of minimal cyclic codes. An interesting consequence, found in [280],
of this characterization of cyclic codes is that the dimension of a sum of cyclic codes satisfies
the same formula as that of the inclusion–exclusion principle, a fact that fails in general.
Theorem 4.3.10 Let C i be a cyclic code of length n over Fq for 1 ≤ i ≤ a. Then:
dim(C 1 + C 2 + · · · + C a ) =
i
+
dim(C i ) −
i< j<k
i< j
dim(C i ∩ C j )
dim(C i ∩ C j ∩ C k ) − · · ·
+ (−1)a−1 dim(C 1 ∩ C 2 ∩ · · · ∩ C a ).
Proof: Let {
ei (x) | 1 ≤ i ≤ s} be the primitive idempotents of Rn . By Theorem 4.3.8,
the minimal ideals of Rn are
ei (x). Fix a basis Bi of
ei (x) for 1 ≤ i ≤ s. Also by
138
Cyclic codes
Theorem 4.3.8, each C i is a direct sum of {
e j (x) | j ∈ S i } for some subset S i of {1,
2, . . . , s}. Thus a basis of C i1 + · · · + C ib is Bi1 ∪ · · · ∪ Bib , and this basis contains |Bi1 ∪
· · · ∪ Bib | = dim(C i1 + · · · + C ib ) elements, where |B| is the number of (distinct) elements
in B. A basis of C i1 ∩ · · · ∩ C ib is Bi1 ∩ · · · ∩ Bib , and this basis contains |Bi1 ∩ · · · ∩ Bib | =
dim(C i1 ∩ · · · ∩ C ib ) elements. Since dim(C 1 + C 2 + · · · + C a ) = |B1 ∪ B2 ∪ · · · ∪ Ba |, we
can apply the inclusion–exclusion principle to obtain the result.
Example 4.3.11 Theorem 4.3.10 does not work in general for noncyclic codes. For example, for 1 ≤ i ≤ 3, let C i be a binary code of length 2 with generator matrix G i , where
G1 = [ 1
0 ],
G2 = [ 0
1 ],
and
G 3 = [ 1 1 ].
Then dim(C i ) = 1 for 1 ≤ i ≤ 3, dim(C i ∩ C j ) = 0 for i = j, and dim(C 1 ∩ C 2 ∩ C 3 ) = 0.
But dim(C1 + C2 + C3 ) = 2, which does not equal 1 + 1 + 1 − 0 − 0 − 0 + 0.
Exercise 229 Prove that if C 1 and C 2 are linear codes of length n over Fq , then dim(C 1 +
C 2 ) = dim(C 1 ) + dim(C 2 ) − dim(C 1 ∩ C 2 ).
We turn now to a particular permutation which maps idempotents of Rn to idempotents of
Rn . Let a be an integer such that gcd(a, n) = 1. The function µa defined on {0, 1, . . . , n − 1}
by iµa ≡ ia (mod n) is a permutation of the coordinate positions {0, 1, . . . , n − 1} of a
cyclic code of length n and is called a multiplier. Because cyclic codes of length n are
represented as ideals in Rn , for a > 0 it is convenient to regard µa as acting on Rn by
f (x)µa ≡ f (x a ) (mod x n − 1).
(4.4)
This equation is consistent with the original definition of µa because x i µa = x ia = x ia+ jn
in Rn for an integer j such that 0 ≤ ia + jn < n since x n = 1 in Rn . In other words
x i µa = x ia mod n . If a < 0, we can attach meaning to f (x a ) in Rn by defining x i µa =
x ia mod n , where, of course, 0 ≤ ia mod n < n. With this interpretation, (4.4) is consistent
with the original definition of µa when a < 0. We leave the proof of the following as an
exercise.
Theorem 4.3.12 Let f (x) and g(x) be elements of Rn . Suppose e(x) is an idempotent of
Rn . Let a be relatively prime to n. Then:
(i) if b ≡ a (mod n), then µb = µa ,
(ii) ( f (x) + g(x))µa = f (x)µa + g(x)µa ,
(iii) ( f (x)g(x))µa = ( f (x)µa )(g(x)µa ),
(iv) µa is an automorphism of Rn ,
(v) e(x)µa is an idempotent of Rn , and
(vi) µq leaves invariant each q-cyclotomic coset modulo n and has order equal to
ordn (q).
Exercise 230 Prove that if gcd(a, n) = 1, then the map µa is indeed a permutation of
{0, 1, . . . , n − 1} as claimed in the text. What happens if gcd(a, n) = 1?
139
4.3 Idempotents and multipliers
Exercise 231 Prove Theorem 4.3.12.
Theorem 4.3.13 Let C be a cyclic code of length n over Fq with generating idempotent
e(x). Let a be an integer with gcd(a, n) = 1. Then:
(i) Cµa = e(x)µa and e(x)µa is the generating idempotent of the cyclic code Cµa , and
(ii) e(x)µq = e(x) and µq ∈ PAut(C).
Proof: Using Theorem 4.3.12(iii), Cµa = {(e(x) f (x))µa | f (x) ∈ Rn } = {e(x)µa ×
f (x)µa | f (x)µa ∈ Rn } = {e(x)µa h(x) | h(x) ∈ Rn } = e(x)µa as µa is an automorphism of Rn by Theorem 4.3.12(iv). Hence Cµa is cyclic and has generating idempotent
e(x)µa by Theorem 4.3.12(v), proving (i).
If we show that e(x)µq = e(x), then by part (i), Cµq = C and so µq ∈ PAut(C). By
Theorem 4.3.8(vi), e(x) = i∈T
ei (x) for some set T . By Theorem 4.3.12(ii), e(x)µq =
ei (x) for all i. But
ei (x)µq =
ei (x q ) = (
ei (x))q by Theorem 3.7.4, the
e(x) if
ei (x)µq =
ei (x)µq is
latter certainly being a nonzero element of
ei (x). But by Theorem 4.3.12(v),
ei (x)µq =
ei (x) by Theorem 4.3.8(v).
also an idempotent of
ei (x). Hence
Exercise 232 Consider the cyclic codes of length 11 over F3 as given in Example 4.3.5.
(a) Find the image of each generating idempotent, and hence each cyclic code, under µ2 .
(b) Verify that µ3 fixes each idempotent.
(c) Write the image of each generator polynomial under µ3 as an element of R11 . Do
generator polynomials get mapped to generator polynomials?
Exercise 233 Show that any two codes of the same dimension in Examples 4.3.4 and 4.3.5
are permutation equivalent.
Note that arbitrary permutations in general do not map idempotents to idempotents, nor
do they even map cyclic codes to cyclic codes.
Corollary 4.3.14 Let C be a cyclic code of length n over Fq . Let A be the group of order
n generated by the cyclic shift i → i + 1 (mod n). Let B be the group of order ordn (q)
generated by the multiplier µq . Then the group G of order n · ordn (q) generated by A and
B is a subgroup of PAut(C).
Proof: The corollary follows from the structure of the normalizer of A in the symmetric
group Symn and Theorem 4.3.13(ii). In fact, G is the semidirect product of A extended
by B.
Exercise 234 In the notation of Corollary 4.3.14, what is the order of the subgroup G of
PAut(C) for the following values of n and q?
(a) n = 15, q = 2.
(b) n = 17, q = 2.
(c) n = 23, q = 2.
(d) n = 15, q = 4.
(e) n = 25, q = 3.
140
Cyclic codes
Corollary 4.3.15 Let C be a cyclic code of length n over Fq with generating idempotent
n−1
e(x) = i=0
ei x i . Then:
(i) ei = e j if i and j are in the same q-cyclotomic coset modulo n,
(ii) if q = 2,
e(x) =
xi ,
j∈J i∈C j
where J is some subset of representatives of 2-cyclotomic cosets modulo n, and
(iii) if q = 2, every element of Rn of the form
xi ,
j∈J i∈C j
where J is some subset of representatives of 2-cyclotomic cosets modulo n, is an
idempotent of Rn .
n−1
n−1
Proof: By Theorem 4.3.13(ii), e(x)µq = e(x). Thus e(x)µq = i=0
ei x i ≡
ei x iq ≡ i=0
n−1
iq
n
modulo n. Hence (i) holds, and
i=0 eiq x (mod x − 1), where subscripts are read
(ii) is a special case of (i). Part (iii) follows as e(x)2 = j∈J i∈C j x 2i = j∈J i∈C j x i =
e(x) by Exercise 152 and the fact that 2C j ≡ C j (mod n).
Since any idempotent is a generating idempotent of some code, the preceding corollary
shows that each idempotent in Rn has the form
e(x) =
xi ,
aj
j
(4.5)
i∈C j
where the outer sum is over a system of representatives of the q-cyclotomic cosets modulo n
and each a j is in Fq . For q = 2, but not for arbitrary q, all such expressions are idempotents.
(Compare Examples 4.3.4 and 4.3.5.)
We can also give the general form for the idempotents in Rn over F4 . We can construct
a set S of representatives of all the distinct 4-cyclotomic cosets modulo n as follows. The
set S = K ∪ L 1 ∪ L 2 , where K , L 1 , and L 2 are pairwise disjoint. K consists of distinct
representatives k, where Ck = C2k . L 1 and L 2 are chosen so that if k ∈ L 1 ∪ L 2 , Ck = C2k ;
furthermore L 2 = {2k | k ∈ L 1 }. Squaring e(x) in (4.5), we obtain
e(x)2 =
x 2i ,
a 2j
j∈S
i∈C j
as R has characteristic 2. But if j ∈ K , then i∈C j x 2i = i∈C j x i ; if j ∈ L 1 , then
n 2i
i
2i
i
i∈C j x =
i∈C2 j x and 2 j ∈ L 2 ; and if 2 j ∈ L 2 , then
i∈C2 j x =
i∈C j x and
j ∈ L 1 as i and 4i are in the same 4-cyclotomic coset. Therefore e(x) is an idempotent
if and only if a 2j = a j for all j ∈ K and a2 j = a 2j for all j ∈ L 1 . In particular e(x) is an
idempotent in Rn if and only if
e(x) =
aj
aj
xi +
x i + a 2j
x 2i ,
(4.6)
j∈K
i∈C j
j∈L 1
i∈C j
i∈C j
where a j ∈ {0, 1} if j ∈ K . Recall that in F4 , is called conjugation, and is given by
0 = 0, 1 = 1, and ω = ω; alternately, a = a 2 . If e(x) is the generating idempotent of C,
141
4.4 Zeros of a cyclic code
then we leave it as an exercise to show that C is a cyclic code with generating idempotent
e(x) = j a j i∈C j x i . Furthermore, by examining (4.6), we see that e(x) = e(x)µ2 . By
Theorem 4.3.13, C = Cµ2 . We summarize these results.
Theorem 4.3.16 Let C be a cyclic code over F4 with generating idempotent e(x). Then e(x)
has the form given in (4.6). Also C = Cµ2 is cyclic with generating idempotent e(x)µ2 .
Exercise 235 Show that if e(x) is the generating idempotent of a cyclic code C over F4 ,
then C is a cyclic code with generating idempotent e(x) = j a j i∈C j x i . Show also that
e(x) = e(x)µ2 .
Exercise 236 Do the following:
(a) List the 4-cyclotomic cosets modulo 21.
(b) Construct a set S = K ∪ L 1 ∪ L 2 of distinct 4-cyclotomic coset representatives modulo
21 which can be used to construct idempotents in R21 over F4 as in the discussion prior
to Theorem 4.3.16.
(c) Give the general form of such an idempotent.
(d) How many of these idempotents are there?
(e) Write down four of these idempotents.
Theorem 4.3.13 shows that µa maps cyclic codes to cyclic codes with the generating
idempotent mapped to the generating idempotent; however, the generator polynomial may
not be mapped to the generator polynomial of the image code. In fact, the automorphism
µq maps the generator polynomial to its qth power. See Exercise 232.
A multiplier takes a cyclic code into an equivalent cyclic code. The following theorem, a
special case of a theorem of Pálfy (see [150]), implies that, in certain instances, two cyclic
codes are permutation equivalent if and only if a multiplier takes one to the other. This is a
very powerful result when it applies.
Theorem 4.3.17 Let C 1 and C 2 be cyclic codes of length n over Fq . Assume that
gcd(n, φ(n)) = 1, where φ is the Euler φ-function. Then C 1 and C 2 are permutation equivalent if and only if there is a multiplier that maps C 1 to C 2 .
Since multipliers send generating idempotents to generating idempotents, we have the
following corollary.
Corollary 4.3.18 Let C 1 and C 2 be cyclic codes of length n over Fq . Assume that
gcd(n, φ(n)) = 1, where φ is the Euler φ-function. Then C 1 and C 2 are permutation equivalent if and only if there is a multiplier that maps the idempotent of C 1 to the idempotent of C 2 .
4.4
Zeros of a cyclic code
Recall from Section 4.1 and, in particular Theorem 4.1.1, that if t = ordn (q), then Fq t
is a splitting field of x n − 1; so Fq t contains a primitive nth root of unity α, and
+n−1
(x − α i ) is the factorization of x n − 1 into linear factors over Fq t . Furx n − 1 = i=0
+
thermore x n − 1 = s Mαs (x) is the factorization of x n − 1 into irreducible factors
142
Cyclic codes
over Fq , where s runs through a set of representatives of the q-cyclotomic cosets
modulo n.
Let C be a cyclic code in Rn with generator polynomial g(x). By Theorems 4.1.1(i) and
+ +
+
4.2.1(vii), g(x) = s Mαs (x) = s i∈Cs (x − α i ), where s runs through some subset of
.
representatives of the q-cyclotomic cosets Cs modulo n. Let T = s Cs be the union of
these q-cyclotomic cosets. The roots of unity Z = {α i | i ∈ T } are called the zeros of the
cyclic code C and {α i | i ∈ T } are the nonzeros of C. The set T is called the defining set of
C. (Note that if you change the primitive nth root of unity, you change T ; so T is computed
relative to a fixed primitive root. This will be discussed further in Section 4.5.) It follows
that c(x) belongs to C if and only if c(α i ) = 0 for each i ∈ T by Theorem 4.2.1. Notice
that T , and hence either the set of zeros or the set of nonzeros, completely determines the
generator polynomial g(x). By Theorem 4.2.1, the dimension of C is n − |T | as |T | is the
degree of g(x).
Example 4.4.1 In Example 4.3.4 a table giving the dimension, generator polynomials gi (x),
and generating idempotents ei (x) of all the cyclic codes C i of length 7 over F2 was given.
We add to that table the defining sets of each code relative to the primitive root α given in
Example 3.4.3.
i
dim
gi (x)
ei (x)
Defining set
0
1
2
3
4
5
6
7
0
1
3
3
4
4
6
7
1 + x7
1 + x + x2 + · · · + x6
1 + x2 + x3 + x4
1 + x + x2 + x4
1 + x + x3
1 + x2 + x3
1+x
1
0
1 + x + x2 + · · · + x6
1 + x3 + x5 + x6
1 + x + x2 + x4
x + x2 + x4
x3 + x5 + x6
x + x2 + · · · + x6
1
{0, 1, 2, 3, 4, 5, 6}
{1, 2, 3, 4, 5, 6}
{0, 1, 2, 4}
{0, 3, 5, 6}
{1, 2, 4}
{3, 5, 6}
{0}
∅
Exercise 237 What would be the defining sets of each of the codes in Example 4.4.1 if the
primitive root β = α 3 were used to determine the defining set rather than α?
Our next theorem gives basic properties of cyclic codes in terms of their defining sets,
summarizing the above discussion.
Theorem 4.4.2 Let α be a primitive nth root of unity in some extension field of Fq . Let C
be a cyclic code of length n over Fq with defining set T and generator polynomial g(x).
The following hold.
(i) T is a union of q-cyclotomic cosets modulo n.
+
(ii) g(x) = i∈T (x − α i ).
(iii) c(x) ∈ Rn is in C if and only if c(α i ) = 0 for all i ∈ T .
(iv) The dimension of C is n − |T |.
Exercise 238 Let C be a cyclic code over Fq with defining set T and generator polynomial
g(x). Let C e be the subcode of all even-like vectors in C.
(a) Prove that C e is cyclic and has defining set T ∪ {0}.
(b) Prove that C = C e if and only if 0 ∈ T if and only if g(1) = 0.
143
4.4 Zeros of a cyclic code
(c) Prove that if C =
C e , then the generator polynomial of C e is (x − 1)g(x).
(d) Prove that if C is binary, then C contains the all-one vector if and only if 0 ∈ T .
Exercise 239 Let C i be cyclic codes of length n over Fq with defining sets Ti for
i = 1, 2.
(a) Prove that C 1 ∩ C 2 has defining set T1 ∪ T2 .
(b) Prove that C 1 + C 2 has defining set T1 ∩ T2 .
(c) Prove that C 1 ⊆ C 2 if and only if T2 ⊆ T1 .
Note: This exercise shows that the lattice of cyclic codes of length n over Fq , where the
join of two codes is the sum of the codes and the meet of two codes is their intersection, is
isomorphic to the “upside-down” version of the lattice of subsets of N = {0, 1, . . . , n − 1}
that are unions of q-cyclotomic cosets modulo n, where the join of two such subsets is
the set union of the subsets and the meet of two subsets is the set intersection of the
subsets.
The zeros of a cyclic code can be used to obtain a parity check matrix (possibly with
dependent rows) as explained in the next theorem. The construction presented in this theorem
is analogous to that of the subfield subcode construction in Section 3.8.
Theorem 4.4.3 Let C be an [n, k] cyclic code over Fq with zeros Z in a splitting field Fq t
of x n − 1 over Fq . Let α ∈ Fq t be a primitive nth root of unity in Fq t , and let Z = {α j |
j ∈ Ci1 ∪ · · · ∪ Ciw }, where Ci1 , . . . , Ciw are distinct q-cyclotomic cosets modulo n. Let L
be the w × n matrix over Fq t defined by
1 α i1 α 2i1 · · · α (n−1)i1
1 α i2 α 2i2 · · · α (n−1)i2
L=
.
..
.
1
α iw
α 2iw
···
α (n−1)iw
Then c is in C if and only if LcT = 0. Choosing a basis of Fq t over Fq , we may represent
each element of Fq t as a t × 1 column vector over Fq . Replacing each entry of L by its
corresponding column vector, we obtain a tw × n matrix H over Fq which has the property
that c ∈ C if and only if H cT = 0. In particular, k ≥ n − tw.
Proof: We have c(x) ∈ C if and only if c(α j ) = 0 for all j ∈ Ci1 ∪ · · · ∪ Ciw , which by
Theorem 3.7.4 is equivalent to c(α i j ) = 0 for 1 ≤ j ≤ w. Clearly, this is equivalent to
LcT = 0, which is a system of homogeneous linear equations with coefficients that are
powers of α. Expanding each of these powers of α in the chosen basis of Fq t over Fq yields
the equivalent system H cT = 0. As the rows of H may be dependent, k ≥ n − tw.
If C ′ is the code over Fq t with parity check matrix L in this theorem, then the code C is
actually the subfield subcode C ′ |Fq .
Exercise 240 Show that the matrix L in Theorem 4.4.3 has rank w. Note: The matrix L is
related to a Vandermonde matrix. See Lemma 4.5.1.
For a cyclic code C in Rn , there are in general many polynomials v(x) in Rn such
that C = v(x). However, by Theorem 4.2.1 and its corollary, there is exactly one such
144
Cyclic codes
polynomial, namely the monic polynomial in C of minimal degree, which also divides x n − 1
and which we call the generator polynomial of C. In the next theorem we characterize all
polynomials v(x) which generate C.
Theorem 4.4.4 Let C be a cyclic code of length n over Fq with generator polynomial g(x).
Let v(x) be a polynomial in Rn .
(i) C = v(x) if and only if gcd(v(x), x n − 1) = g(x).
(ii) v(x) generates C if and only if the nth roots of unity which are zeros of v(x) are precisely
the zeros of C.
Proof: First assume that gcd(v(x), x n − 1) = g(x). As g(x) | v(x), multiples of v(x) are
multiples of g(x) in Rn and so v(x) ⊆ C. By the Euclidean Algorithm there exist
polynomials a(x) and b(x) in Fq [x] such that g(x) = a(x)v(x) + b(x)(x n − 1). Hence
g(x) = a(x)v(x) in Rn and so multiples of g(x) are multiples of v(x) in Rn implying
v(x) ⊇ C. Thus C = v(x).
For the converse, assume that C = v(x). Let d(x) = gcd(v(x), x n − 1). As g(x) | v(x)
and g(x) | (x n − 1) by Theorem 4.2.1, g(x) | d(x) by Exercise 158. As g(x) ∈ C = v(x),
there exists a polynomial a(x) such that g(x) = a(x)v(x) in Rn . So there exists a polynomial
b(x) such that g(x) = a(x)v(x) + b(x)(x n − 1) in Fq [x]. Thus d(x) | g(x) by Exercise 158.
Hence as both d(x) and g(x) are monic and divide each other, d(x) = g(x) and (i) holds.
As the only roots of both g(x) and x n − 1 are nth roots of unity, g(x) = gcd(v(x), x n − 1)
if and only if the nth roots of unity which are zeros of v(x) are precisely the zeros of g(x);
the latter are the zeros of C.
Corollary 4.4.5 Let C be a cyclic code of length n over Fq with zeros {α i | i ∈ T } for
some primitive nth root of unity α where T is the defining set of C. Let a be an integer such
that gcd(a, n) = 1 and let a −1 be the multiplicative inverse of a in the integers modulo n.
−1
Then {α a i | i ∈ T } are the zeros of the cyclic code Cµa and a −1 T mod n is the defining
set for Cµa .
Proof: Let e(x) be the generating idempotent of C. By Theorem 4.3.13, the generating
idempotent of the cyclic code Cµa is e(x)µa . By Theorem 4.4.4, the zeros of C and Cµa
are the nth roots of unity which are also roots of e(x) and e′ (x) = e(x)µa , respectively. As
e′ (x) = e(x)µa ≡ e(x a ) (mod x n − 1), e′ (x) = e(x a ) + b(x)(x n − 1) in Fq [x]. The corollary now follows from the fact that the nth root of unity α j is a root of e′ (x) if and only if
α a j is a root of e(x).
Theorem 4.3.13 implies that the image of one special vector, the generating idempotent, of
a cyclic code under a multiplier determines the image code. As described in Corollary 4.4.5
a multiplier maps the defining set, and hence the zeros, of a cyclic code to the defining
set, and hence the zeros, of the image code. Such an assertion is not true for a general
permutation.
Exercise 241 An equivalence class of codes is the set of all codes that are equivalent
to one another. Give a defining set for a representative of each equivalence class of the
binary cyclic codes of length 15. Example 3.7.8, Theorem 4.3.17, and Corollary 4.4.5 will
be useful.
145
4.4 Zeros of a cyclic code
Exercise 242 Continuing with Exercise 241, do the following:
(a) List the 2-cyclotomic cosets modulo 31.
(b) List the defining sets for all [31, 26] binary cyclic codes. Give a defining set for a
representative of each equivalence class of the [31, 26] binary cyclic codes. Hint: Use
Theorem 4.3.17 and Corollary 4.4.5.
(c) Repeat part (b) for [31, 5] binary cyclic codes. (Take advantage of your work in
part (b).)
(d) List the 15 defining sets for all [31, 21] binary cyclic codes, and give a defining set for
a representative of each equivalence class of these codes.
(e) List the 20 defining sets for all [31, 16] binary cyclic codes, and give a defining set for
a representative of each equivalence class of these codes.
If C is a code of length n over Fq , then a complement of C is a code C c such that C + C c =
and C ∩ C c = {0}. In general, the complement is not unique. However, Exercise 243
shows that if C is a cyclic code, there is a unique complement of C that is also cyclic. We
call this code the cyclic complement of C. In the following theorem we give the generator
polynomial and generating idempotent of the cyclic complement.
Fqn
Exercise 243 Prove that a cyclic code has a unique complement that is also cyclic.
Theorem 4.4.6 Let C be a cyclic code of length n over Fq with generator polynomial g(x),
generating idempotent e(x), and defining set T . Let C c be the cyclic complement of C. The
following hold.
(i) h(x) = (x n − 1)/g(x) is the generator polynomial for C c and 1 − e(x) is its generating
idempotent.
(ii) C c is the sum of the minimal ideals of Rn not contained in C.
(iii) If N = {0, 1, . . . , n − 1}, then N \ T is the defining set of C c .
Exercise 244 Prove Theorem 4.4.6.
The dual C ⊥ of a cyclic code C is also cyclic as Theorem 4.2.6 shows. The generator polynomial and generating idempotent for C ⊥ can be obtained from the generator
polynomial and generating idempotent of C. To find these, we reintroduce the concept of
the reciprocal polynomial encountered in Exercise 192. Let f (x) = f 0 + f 1 x + · · · + f a x a
be a polynomial of degree a in Fq [x]. The reciprocal polynomial of f (x) is the
polynomial
f ∗ (x) = x a f (x −1 ) = x a ( f (x)µ−1 ) = f a + f a−1 x + · · · + f 0 x a .
So f ∗ (x) has coefficients the reverse of those of f (x). Furthermore, f (x) is reversible
provided f (x) = f ∗ (x).
Exercise 245 Show that a monic irreducible reversible polynomial of degree greater than
1 cannot be a primitive polynomial except for the polynomial 1 + x + x 2 over F2 .
We have the following basic properties of reciprocal polynomials. Their proofs are left
as an exercise.
146
Cyclic codes
Lemma 4.4.7 Let f (x) ∈ Fq [x].
(i) If β1 , . . . , βr are the nonzero roots of f in some extension field of Fq , then β1−1 , . . . ,
βr−1 are the nonzero roots of f ∗ in that extension field.
(ii) If f (x) is irreducible over Fq , so is f ∗ (x).
(iii) If f (x) is a primitive polynomial, so is f ∗ (x).
Exercise 246 Prove Lemma 4.4.7.
Exercise 247 In Example 3.7.8, the factorization of x 15 − 1 into irreducible polynomials
over F2 was found. Find the reciprocal polynomial of each of these irreducible polynomials.
How does this confirm Lemma 4.4.7?
Exercise 248 Prove that if f 1 (x) and f 2 (x) are reversible polynomials in Fq [x], so is
f 1 (x) f 2 (x). What about f 1 (x) + f 2 (x)?
The connection between dual codes and reciprocal polynomials is clear from the following lemma.
Lemma 4.4.8 Let a = a0 a1 · · · an−1 and b = b0 b1 · · · bn−1 be vectors in Fqn with associated
polynomials a(x) and b(x). Then a is orthogonal to b and all its shifts if and only if
a(x)b∗ (x) = 0 in Rn .
Proof: Let b(i) = bi bi+1 · · · bn+i−1 be the ith cyclic shift of b, where the subscripts are read
modulo n. Then
n−1
a · b(i) = 0 if and only if
j=0
a j b j+i = 0.
(4.7)
But a(x)b∗ (x) = 0 in Rnif and
only if a(x)(x n−1−deg b(x) )b∗ (x) = 0 in Rn . But
n−1 n−1
( j=0 a j b j+i x n−1−i ). Thus a(x)b∗ (x) = 0 in Rn if and
a(x)(x n−1−deg b(x) )b∗ (x) = i=0
only if (4.7) holds for 0 ≤ i ≤ n − 1.
We now give the generator polynomial and generating idempotent of the dual of a cyclic
code. The proof is left as an exercise.
Theorem 4.4.9 Let C be an [n, k] cyclic code over Fq with generator polynomial g(x),
generating idempotent e(x), and defining set T . Let h(x) = (x n − 1)/g(x). The following
hold.
(i) C ⊥ is a cyclic code and C ⊥ = C c µ−1 .
(ii) C ⊥ has generating idempotent 1 − e(x)µ−1 and generator polynomial
xk
h(x −1 ).
h(0)
(iii) If β1 , . . . , βk are the zeros of C, then β1−1 , . . . , βk−1 are the nonzeros of C ⊥ .
(iv) If N = {0, 1, . . . , n − 1}, then N \ (−1)T mod n is the defining set of C ⊥ .
(v) Precisely one of C and C ⊥ is odd-like and the other is even-like.
The polynomial h(x) = (x n − 1)/g(x) in this theorem is called the check polynomial of
C. The generator polynomial of C ⊥ in part (ii) of the theorem is the reciprocal polynomial
of h(x) rescaled to be monic.
147
4.4 Zeros of a cyclic code
Exercise 249 Prove Theorem 4.4.9.
Exercise 250 Let C be a cyclic code with cyclic complement C c . Prove that if C is MDS
so is C c .
The following corollary determines, from the generator polynomial, when a cyclic code
is self-orthogonal.
Corollary 4.4.10 Let C be a cyclic code over Fq of length n with generator polynomial
g(x) and check polynomial h(x) = (x n − 1)/g(x). Then C is self-orthogonal if and only if
h ∗ (x) | g(x).
Exercise 251 Prove Corollary 4.4.10.
Exercise 252 Using Corollary 4.4.10 and Examples 3.7.8, 4.2.4, and 4.3.4 give the generator polynomials of the self-orthogonal binary cyclic codes of lengths 7, 9, and 15.
In the next theorem we show how to decide when a cyclic code is self-orthogonal.
In particular, this characterization shows that all self-orthogonal cyclic codes are evenlike. In this theorem we use the observation that if C is a q-cyclotomic coset modulo n,
either Cµ−1 = C or Cµ−1 = C ′ for some different q-cyclotomic coset C ′ , in which case
C ′ µ−1 = C as µ2−1 is the identity.
Theorem 4.4.11 Let C be a self-orthogonal cyclic code over Fq of length n with defining set T . Let C1 , . . . , Ck , D1 , . . . , Dℓ , E 1 , . . . , E ℓ be all the distinct q-cyclotomic cosets
modulo n partitioned so that Ci = Ci µ−1 for 1 ≤ i ≤ k and Di = E i µ−1 for 1 ≤ i ≤ ℓ.
The following hold.
(i) Ci ⊆ T for 1 ≤ i ≤ k and at least one of Di or E i is contained in T for 1 ≤ i ≤ ℓ.
(ii) C is even-like.
(iii) C ∩ Cµ−1 = {0}.
Conversely, if C is a cyclic code with defining set T that satisfies (i), then C is self-orthogonal.
Proof: Let N = {0, 1, . . . , n − 1}. Let T ⊥ be the defining set of C ⊥ . By Theorem 4.4.9,
T ⊥ = N \ (−1)T mod n. As C ⊆ C ⊥ , N \ (−1)T mod n ⊆ T by Exercise 239. If Ci ⊆ T ,
then Ci ⊆ (−1)T mod n because Ci = Ci µ−1 implying that Ci ⊆ N \ (−1)T mod n ⊆ T ,
a contradiction. If Di ⊆ T , then E i ⊆ (−1)T mod n because E i = Di µ−1 implying that
E i ⊆ N \ (−1)T mod n ⊆ T , proving (i). Part (ii) follows from part (i) and Exercise 238
as Ci = {0} for some i. By Corollary 4.4.5, Cµ−1 has defining set (−1)T mod n. By (i)
T ∪ (−1)T mod n = N yielding (iii) using Exercise 239.
For the converse, assume T satisfies (i). We only need to show that T ⊥ ⊆ T , where T ⊥ =
N \ (−1)T mod n by Exercise 239. As Ci ⊆ T for 1 ≤ i ≤ k, Ci ⊆ (−1)T mod n implying
Ci ⊆ T ⊥ . Hence T ⊥ is a union of some Di s and E i s. If Di ⊆ N \ (−1)T mod n, then
Di ⊆ (−1)T mod n and so E i ⊆ T . By (i) Di ⊆ T . Similarly if E i ⊆ N \ (−1)T mod n,
then E i ⊆ T . Hence T ⊥ ⊆ T implying C is self-orthogonal.
Exercise 253 Continuing with Exercise 242, do the following:
(a) Show that all [31, 5] binary cyclic codes are self-orthogonal.
(b) Show that there are two inequivalent [31, 15] self-orthogonal binary cyclic codes, and
give defining sets for a code in each equivalence class.
148
Cyclic codes
Corollary 4.4.12 Let D = C + 1 be a cyclic code of length n over Fq , where C is selforthogonal. Then D ∩ Dµ−1 = 1.
Exercise 254 Prove Corollary 4.4.12.
Corollary 4.4.13 Let p1 (x), . . . , pk (x), q1 (x), . . . , qℓ (x), r1 (x), . . . , rℓ (x) be the monic irreducible factors of x n − 1 over Fq [x] arranged as follows. For 1 ≤ i ≤ k, pi∗ (x) = ai pi (x)
for some ai ∈ Fq , and for 1 ≤ i ≤ ℓ, ri∗ (x) = bi qi (x) for some bi ∈ Fq . Let C be a cyclic
code of length n over Fq with generator polynomial g(x). Then C is self-orthogonal
if and only if g(x) has factors p1 (x) · · · pk (x) and at least one of qi (x) or ri (x) for
1 ≤ i ≤ ℓ.
Exercise 255 Prove Corollary 4.4.13.
Exercise 256 Using Corollary 4.4.13 and Examples 3.7.8, 4.2.4, and 4.3.4 give the generator polynomials of the self-orthogonal binary cyclic codes of lengths 7, 9, and 15. Compare
your answers to those of Exercise 252.
Exercise 257 Let j(x) = 1 + x + x 2 + · · · + x n−1 in Rn and j(x) = (1/n) j(x). In Exercise 221 we gave properties of j(x) and j(x). Let C be a cyclic code over Fq with generating
idempotent i(x). Let C e be the subcode of all even-like vectors in C. In Exercise 238 we
found the generator polynomial of C e .
(a) Prove that 1 − j(x) is the generating idempotent of the [n, n − 1] cyclic code over Fq
consisting of all even-like vectors in Rn .
(b) Prove that i(1) = 0 if C = C e and i(1) = 1 if C = C e .
(c) Prove that if C = C e , then i(x) − j(x) is the generating idempotent of C e .
We illustrate Theorems 4.3.13, 4.3.17, 4.4.6, and 4.4.9 by returning to Examples 4.3.4
and 4.3.5.
Example 4.4.14 In Examples 4.3.4 and 4.3.5, the following codes are cyclic complementary pairs: C 1 and C 6 , C 2 and C 5 , and C 3 and C 4 . In both examples, the following are dual
pairs: C 1 and C 6 , C 2 and C 4 , and C 3 and C 5 . In Example 4.3.4, C 2 and C 3 are equivalent
under µ3 , as are C 4 and C 5 . In Example 4.3.5, the same pairs are equivalent under µ2 . In
both examples, the permutation automorphism group for each of C 1 , C 6 , and C 7 is the full
symmetric group. (In general, the permutation automorphism group of the repetition code
of length n, and hence its dual, is the symmetric group on n letters.) In Example 4.3.4, the
group of order 3 generated by µ2 is a subgroup of the automorphism group of the remaining
four codes; in Example 4.3.5, the group of order 5 generated by µ3 is a subgroup of the
automorphism group of the remaining four codes.
Exercise 258 Verify all the claims in Example 4.4.14.
149
4.4 Zeros of a cyclic code
Exercise 259 Let C be a cyclic code of length n over Fq with generator polynomial g(x).
What conditions on g(x) must be satisfied for the dual of C to equal the cyclic complement
of C?
Exercise 260 Identify all binary cyclic codes of lengths 7, 9, and 15 whose duals equal
their cyclic complements. (Examples 3.7.8, 4.2.4, and 4.3.4 will be useful.)
In Theorem 4.4.9 we found the generating idempotent of the dual of any code. The
multiplier µ−1 was key in that theorem. We can also find the generating idempotent of the
Hermitian dual of a cyclic code over F4 . Here µ−2 will play the role of µ−1 . Recall from
⊥
Exercise 8 that if C is a code over F4 , then C ⊥ H = C .
Theorem 4.4.15 Let C be a cyclic code of length n over F4 with generating idempotent
e(x) and defining set T . The following hold.
(i) C ⊥ H is a cyclic code and C ⊥ H = C c µ−2 , where C c is the cyclic complement of C.
(ii) C ⊥ H has generating idempotent 1 − e(x)µ−2 .
(iii) If N = {0, 1, . . . , n − 1}, then N \ (−2)T mod n is the defining set of C ⊥ H .
(iv) Precisely one of C and C ⊥ H is odd-like and the other is even-like.
Proof: We leave the fact that C ⊥ H is a cyclic code as an exercise. Exercise 8 shows that
⊥
⊥
c
c
C ⊥ H = C . By Theorem 4.4.9 C = C µ−1 . Theorem 4.3.16 shows that C = C c = C c µ2 ,
and (i) follows since µ2 µ−1 = µ−2 . By Theorem 4.4.6 C ⊥ H = C c µ−2 has generating idempotent (1 − e(x))µ−2 = 1 − e(x)µ−2 , giving (ii). Cµ−2 has defining set (−2)−1 T mod n by
Corollary 4.4.5. However, (−2)−1 T = (−2)T modulo n because µ2−2 = µ4 and µ4 fixes all
4-cyclotomic cosets. By Theorem 4.4.6(iii) and Corollary 4.4.5, C ⊥ H = C c µ−2 has defining set (−2)−1 (N \ T ) = N \ (−2)T mod n, giving (iii). Part (iv) follows from (iii) and
Exercise 238 as precisely one of N and N \ (−2)T mod n contains 0.
Exercise 261 Prove that if C be a cyclic code over F4 , then C ⊥ H is also a cyclic code.
Exercise 262 Using Theorem 4.3.16 find generating idempotents of all the cyclic codes C
of length 9 over F4 , their ordinary duals C ⊥ , and their Hermitian duals C ⊥ H .
We can obtain a result analogous to Theorem 4.4.11 for Hermitian self-orthogonal cyclic
codes over F4 . Again we simply replace µ−1 by µ−2 and apply Theorem 4.4.15. Notice
that if C is a 4-cyclotomic coset modulo n, then either Cµ−2 = C or Cµ−2 = C ′ for a
different 4-cyclotomic coset C ′ , in which case C ′ µ−2 = C as µ2−2 = µ4 and µ4 fixes all
4-cyclotomic cosets.
Theorem 4.4.16 Let C be a Hermitian self-orthogonal cyclic code over F4 of length n with
defining set T . Let C1 , . . . , Ck , D1 , . . . , Dℓ , E 1 , . . . , E ℓ be all the distinct 4-cyclotomic
cosets modulo n partitioned so that Ci = Ci µ−2 for 1 ≤ i ≤ k and Di = E i µ−2 for 1 ≤
i ≤ ℓ. The following hold:
(i) Ci ⊆ T for 1 ≤ i ≤ k and at least one of Di or E i is contained in T for 1 ≤ i ≤ ℓ.
(ii) C is even-like.
(iii) C ∩ Cµ−2 = {0}.
150
Cyclic codes
Conversely, if C is a cyclic code with defining set T that satisfies (i), then C is Hermitian
self-orthogonal.
Exercise 263 Prove Theorem 4.4.16.
Corollary 4.4.17 Let D = C + 1 be a cyclic code of length n over F4 such that C is
Hermitian self-orthogonal. Then D ∩ Dµ−2 = 1.
Exercise 264 Prove Corollary 4.4.17.
The next theorem shows the rather remarkable fact that a binary self-orthogonal cyclic
code must be doubly-even.
Theorem 4.4.18 A self-orthogonal binary cyclic code is doubly-even.
Proof: Let C be an [n, k] self-orthogonal binary cyclic code with defining set T . By Theorem 1.4.5(iv), C has only even weight codewords and hence 0 ∈ T by Exercise 238. Suppose
that C is not doubly-even. Then the subcode C 0 of C consisting of codewords of weights
divisible by 4 has dimension k − 1 by Theorem 1.4.6. Clearly, C 0 is cyclic as the cyclic
shift of a vector is a vector of the same weight. By Theorem 4.4.2 and Corollary 4.2.5 (or
/ T . But then {a} must be a
Exercise 239), the defining set of C 0 is T ∪ {a} for some a ∈
2-cyclotomic coset modulo n, which implies that 2a ≡ a (mod n). Hence a = 0 as n is odd,
which is impossible as 0 ∈ T .
In Theorem 4.3.8, the minimal cyclic codes in Rn are shown to be those with generator
polynomials g(x) where (x n − 1)/g(x) is irreducible over Fq . So minimal cyclic codes are
sometimes called irreducible cyclic codes. These minimal cyclic codes can be described
using the trace function.
Theorem 4.4.19 Let g(x) be an irreducible factor of x n − 1 over Fq . Suppose g(x) has
degree s, and let γ ∈ Fq s be a root of g(x). Let Trs : Fq s → Fq be the trace map from Fq s
to Fq . Then
0
/
n−1
Cγ =
i=0
Trs (ξ γ i )x i | ξ ∈ Fq s
i
is the [n, s] irreducible cyclic code with nonzeros {γ −q | 0 ≤ i < s}.
n−1
Proof: By Lemma 3.8.5, C γ is a nonzero linear code over Fq . If cξ (x) = i=0
Tr (ξ γ i )x i ,
n−1 s i
then cξ γ −1 (x) = cξ (x)x in Rn implying that C γ is cyclic. Let g(x) = i=0 gi x . By
Lemma 3.8.5, as gi ∈ Fq and g(γ ) = 0,
n−1
i=0
n−1
gi Trs (ξ γ i ) = Trs ξ
gi γ i
i=0
= Trs (0) = 0.
⊥
Hence g(x) ⊆ C ⊥
γ . By Theorem 4.4.9, C γ is a cyclic code not equal to Rn as C γ = {0}.
As g(x) is irreducible over Fq , there can be no proper cyclic codes between g(x) and Rn .
So g(x) = C ⊥
γ . The result follows from Theorem 4.4.9(iii).
151
4.5 Minimum distance of cyclic codes
4.5
Minimum distance of cyclic codes
With any code, it is important to be able to determine the minimum distance in order
to determine its error-correcting capability. It is therefore helpful to have bounds on the
minimum distance, particularly lower bounds. There are several known lower bounds for
the minimum distance of a cyclic code. The oldest of these is the Bose–Ray-Chaudhuri–
Hocquenghem Bound [28, 132], usually called the BCH Bound, which is fundamental to
the definition of the BCH codes presented in Chapter 5. Improvements of this bound have
been obtained by Hartmann and Tzeng [117], and later van Lint and Wilson [203]. The BCH
Bound, the Hartmann–Tzeng Bound, and a bounding technique of van Lint and Wilson are
presented here. The BCH and Hartmann–Tzeng Bounds depend on the zeros of the code
and especially on the ability to find strings of “consecutive” zeros.
Before proceeding with the BCH Bound, we state a lemma, used in the proof of the BCH
Bound and useful elsewhere as well, about the determinant of a Vandermonde matrix. Let
is called
α1 , . . . , αs be elements in a field F. The s × s matrix V = [vi, j ], where vi, j = α i−1
j
a Vandermonde matrix. Note that the transpose of this matrix is also called a Vandermonde
matrix.
Lemma 4.5.1 det V =
α1 , . . . , αs are distinct.
+
1≤i< j≤s (α j
− αi ). In particular, V is nonsingular if the elements
In this section we will assume that C is a cyclic code of length n over Fq and that α is
a primitive nth root of unity in Fq t , where t = ordn (q). Recall that T is a defining set for
C provided the zeros of C are {α i | i ∈ T }. So T must be a union of q-cyclotomic cosets
modulo n. We say that T contains a set of s consecutive elements S provided there is a set
{b, b + 1, . . . , b + s − 1} of s consecutive integers such that
{b, b + 1, . . . , b + s − 1} mod n = S ⊆ T.
Example 4.5.2 Consider the binary cyclic code C of length 7 with defining set T =
{0, 3, 6, 5}. Then T has a set of three consecutive elements S = {5, 6, 0}.
Theorem 4.5.3 (BCH Bound) Let C be a cyclic code of length n over Fq with defining set
T . Suppose C has minimum weight d. Assume T contains δ − 1 consecutive elements for
some integer δ. Then d ≥ δ.
Proof: By assumption, C has zeros that include α b , α b+1 , . . . , α b+δ−2 . Let c(x) be a nonzero
codeword in C of weight w, and let
w
c(x) =
ci j x i j .
j=1
Assume to the contrary that w < δ. As c(α i ) = 0 for b ≤ i ≤ b + δ − 2, MuT = 0,
152
Cyclic codes
where
α i1 b
α i1 (b+1)
M =
α i2 b
α i2 (b+1)
α i1 (b+w−1)
α i2 (b+w−1)
···
···
..
.
α iw b
α iw (b+1)
· · · α iw (b+w−1)
and u = ci1 ci2 · · · ciw . Since u = 0, M is a singular matrix and hence det M = 0. But det M =
α (i1 +i2 +···+iw )b det V , where V is the Vandermonde matrix
1
α i1
V =
α i1 (w−1)
1
α i2
α i2 (w−1)
···
···
..
.
1
α iw
· · · α iw (w−1)
.
Since the α i j are distinct, det V = 0 by Lemma 4.5.1, contradicting det M = 0.
Exercise 265 Find a generator polynomial and generator matrix of a triple error-correcting
[15, 5] binary cyclic code.
The BCH Bound asserts that you want to find the longest set of consecutive elements in
the defining set. However, the defining set depends on the primitive element chosen. Let
β = α a , where gcd(a, n) = 1; so β is also a primitive nth root of unity. Therefore, if a −1 is
the multiplicative inverse of a modulo n, the minimal polynomials Mαs (x) and Mβ a−1 s (x)
are equal. So the code with defining set T , relative to the primitive element α, is the same
as the code with defining set a −1 T mod n, relative to the primitive element β. Thus when
applying the BCH Bound, or any of our other lower bounds, a higher lower bound may
be obtained if you apply a multiplier to the defining set. Alternately, the two codes C and
Cµa are equivalent and have defining sets T and a −1 T (by Corollary 4.4.5) with respect
to the same primitive element α; hence they have the same minimum weight and so either
defining set can be used to produce the best bound.
Example 4.5.4 Let C be the [31, 25, d] binary cyclic code with defining set T =
{0, 3, 6, 12, 24, 17}. Applying the BCH Bound to C, we see that d ≥ 2, as the longest
consecutive set in T is size 1. However, multiplying T by 3−1 ≡ 21 (mod 31), we have
3−1 T mod 31 = {0, 1, 2, 4, 8, 16}. Replacing α by α 3 or C by Cµ3 and applying the BCH
Bound, we obtain d ≥ 4. In fact C is the even weight subcode of the Hamming code H5
and d = 4.
Example 4.5.5 Let C be the [23, 12, d] binary cyclic code with defining set T =
{1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18}. The BCH Bound implies that d ≥ 5 as T has four
consecutive elements. Notice that T = C1 , the 2-cyclotomic coset modulo 23 containing 1. Modulo 23, there are only two other 2-cyclotomic cosets: C0 = {0} and C5 =
{5, 7, 10, 11, 14, 15, 17, 19, 20, 21, 22}. Let C e be the subcode of C of even weight
153
4.5 Minimum distance of cyclic codes
codewords; C e is cyclic with defining set C0 ∪ C1 by Exercise 238. By Theorem 4.4.11,
C e is self-orthogonal. Hence C e is doubly-even by Theorem 4.4.18. Therefore its minimum
weight is at least 8; it must be exactly 8 by the Sphere Packing Bound. So C e has nonzero
codewords of weights 8, 12, 16, and 20 only. As C contains the all-one codeword by Exercise 238, C e cannot contain a codeword of weight 20 as adding such a codeword to 1
produces a weight 3 codeword, contradicting d ≥ 5. Therefore C e has nonzero codewords
of weights 8, 12, and 16 only. Since C = C e ∪ 1 + C e , C has nonzero codewords of weights
7, 8, 11, 12, 15, 16, and 23 only. In particular d = 7. By Theorem 1.12.3, this code must be
the [23, 12, 7] binary Golay code.
Hartmann and Tzeng [117] showed that if there are several consecutive sets of δ − 1
elements in the defining set that are spaced properly, then the BCH Bound can be improved.
To state the Hartmann–Tzeng Bound, which we do not prove, we develop the following
notation. If A and B are subsets of the integers modulo n, then A + B = {a + b mod n |
a ∈ A, b ∈ B}.
Theorem 4.5.6 (Hartmann–Tzeng Bound) Let C be a cyclic code of length n over Fq
with defining set T . Let A be a set of δ − 1 consecutive elements of T and B = { jb mod n |
0 ≤ j ≤ s}, where gcd(b, n) < δ. If A + B ⊆ T , then the minimum weight d of C satisfies
d ≥ δ + s.
Clearly the BCH Bound is the Hartmann–Tzeng Bound with s = 0.
Example 4.5.7 Let C be the binary cyclic code of length 17 with defining set T =
{1, 2, 4, 8, 9, 13, 15, 16}. There are two consecutive elements in T and so the BCH Bound
gives d ≥ 3. The Hartmann–Tzeng Bound improves this. Let A = {1, 2} and B = {0, 7, 14}.
So δ = 3, b = 7, and s = 2; also gcd(7, 17) = 1 < δ. So C has minimum weight d ≥ 5.
Note that C is a [17, 9] code. By the Griesmer Bound, there is no [17, 9, 7] code. Hence
d = 5 or 6. In fact, d = 5; see [203].
Example 4.5.8 Let C be the binary cyclic code of length 31 with defining set T =
{1, 2, 4, 5, 8, 9, 10, 16, 18, 20} and minimum weight d. The BCH Bound shows that d ≥ 4
as the consecutive elements {8, 9, 10} are in T . If d = 4, then the minimum weight vectors
are in the subcode C e of all even-like vectors in C. By Exercise 238, C e is cyclic with defining set Te = {0, 1, 2, 4, 5, 8, 9, 10, 16, 18, 20}. Applying the Hartmann–Tzeng Bound with
A = {0, 1, 2} and B = {0, 8} (since gcd(8, 31) = 1 < δ = 4), the minimum weight of C e is
at least 5. Hence the minimum weight of C is at least 5, which is in fact the true minimum
weight by [203].
Example 4.5.9 Let C be the binary cyclic code of length 31 with defining set T =
{1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13,16, 17, 18, 20, 21, 22, 24, 26} and minimum weight d.
Applying the Hartmann–Tzeng Bound with A = {1, 2, 3, 4, 5, 6} and B = {0, 7} (since
gcd(7, 31) = 1 < δ = 7), we obtain d ≥ 8. Suppose that d = 8. Then the cyclic subcode
C e of even-like codewords has defining set Te = T ∪ {0} by Exercise 238. But 17Te contains
the nine consecutive elements {29, 30, 0, 1, 2, 3, 4, 5, 6}. Hence, the code with defining set
17Te has minimum weight at least 10 by the BCH Bound, implying that d = 8 is impossible.
154
Cyclic codes
Thus d ≥ 9. We reconsider this code in Example 4.5.14 where we eliminate d = 9 and
d = 10; the actual value of d is 11.
Exercise 266 Let C be the code of Example 4.5.9. Eliminate the possibility that d = 10
by showing that Theorem 4.4.18 applies to C e .
Exercise 267 Let Ci be the 2-cyclotomic coset modulo n containing i. Apply the BCH
Bound and the Hartmann–Tzeng Bound to cyclic codes of given length and defining set T .
For which of the codes is one of the bounds improved if you multiply the defining set by a,
where gcd(a, n) = 1?
(a) n = 15 and T = C5 ∪ C7 ,
(b) n = 15 and T = C3 ∪ C5 ,
(c) n = 39 and T = C3 ∪ C13 ,
(d) n = 45 and T = C1 ∪ C3 ∪ C5 ∪ C9 ∪ C15 ,
(e) n = 51 and T = C1 ∪ C9 .
Exercise 268 Find the dimensions of the codes in Examples 4.5.8 and 4.5.9. For each code
also give upper bounds on the minimum distance from the Griesmer Bound.
Generalizations of the Hartmann–Tzeng Bound discovered by Roos can be found in
[296, 297].
In [203] van Lint and Wilson give techniques that can be used to produce lower bounds
on the minimum weight of a cyclic code; we present one of these, which we will refer to
as the van Lint–Wilson Bounding Technique. The van Lint–Wilson Bounding Technique
can be used to prove both the BCH and the Hartmann–Tzeng Bounds. In order to use the
van Lint–Wilson Bounding Technique, a sequence of subsets of the integers modulo n,
related to the defining set of a code, must be constructed. Let N = {0, 1, . . . , n − 1}. Let
S ⊆ N . A sequence I0 , I1 ,I2 , . . . of subsets of N is an independent sequence with respect
to S provided
1. I0 = ∅, and
2. if i > 0, either Ii = I j ∪ {a} for some 0 ≤ j < i such that I j ⊆ S and a ∈ N \ S, or
Ii = {b} + I j for some 0 ≤ j < i and b = 0.
A subset I of N is independent with respect to S provided that I is a set in an independent
sequence with respect to S. If S = N , then only the empty set is independent with respect
to S. Recall that α is a primitive nth root of unity in Fq t , where t = ordn (q).
Theorem 4.5.10 (van Lint–Wilson Bounding Technique) Suppose f (x) is in Fq [x] and
has degree at most n. Let I be any subset of N that is independent with respect to S = {i ∈
N | f (α i ) = 0}. Then the weight of f (x) is at least the size of I .
Proof: Let f (x) = c1 x i1 + c2 x i2 + · · · + cw x iw , where c j = 0 for 1 ≤ j ≤ w, and let I be
any independent set with respect to S. Let I0 , I1 , I2 , . . . be an independent sequence with
155
4.5 Minimum distance of cyclic codes
respect to S such that I = Ii for some i ≥ 0. If J ⊆ N , let V (J ) be the set of vectors in Fqwt
defined by
V (J ) = {(α ki1 , α ki2 , . . . , α kiw ) | k ∈ J }.
If we prove that V (I ) is linearly independent in Fqwt , then w ≥ |V (I )| = |I |. Hence it suffices to show that V (Ii ) is linearly independent for all i ≥ 0, which we do by induction
on i. As I0 = ∅, V (I0 ) is the empty set and hence is linearly independent. Assume that
V (I j ) is linearly independent for all 0 ≤ j < i. Ii is formed in one of two ways. First
suppose Ii = I j ∪ {a} for some 0 ≤ j < i, where I j ⊆ S and a ∈ N \ S. By the inductive
assumption, V (I j ) is linearly independent. As 0 = f (α k ) = c1 α ki1 + c2 α ki2 + · · · + cw α kiw
for k ∈ I j , the vector c = (c1 , c2 , . . . , cw ) is orthogonal to V (I j ). If V (Ii ) is linearly
dependent, then (α ai1 , α ai2 , . . . , α aiw ) is in the span of V (I j ) and hence is orthogonal
to (c1 , c2 , . . . , cw ). So 0 = c1 α ai1 + c2 α ai2 + · · · + cw α aiw = f (α a ), a contradiction as
a∈
/ S. Now suppose that Ii = {b} + I j for some 0 ≤ j < i where b = 0. Then V (Ii ) =
{(α (b+k)i1 , α (b+k)i2 , . . . , α (b+k)iw ) | k ∈ I j } = V (I j )D, where D is the nonsingular diagonal
matrix diag(α bi1 , . . . , α biw ). As V (I j ) is independent, so is V (Ii ).
The BCH Bound is a corollary of the van Lint–Wilson Bounding Technique.
Corollary 4.5.11 Let C be a cyclic code of length n over Fq . Suppose that f (x) is a
nonzero codeword such that f (α b ) = f (α b+1 ) = · · · = f (α b+w−1 ) = 0 but f (α b+w ) = 0.
Then wt( f (x)) ≥ w + 1.
/ S.
Proof: Let S = {i ∈ N | f (α i ) = 0}. So {b, b + 1, . . . , b + w − 1} ⊆ S but b + w ∈
Inductively define a sequence I0 , I1 , . . . , I2w+1 as follows: Let I0 = ∅. Let I2i+1 = I2i ∪
{b + w} for 0 ≤ i ≤ w and I2i = {−1} + I2i−1 for 1 ≤ i ≤ w. Inductively I2i = {b + w −
i, b + w − i + 1, . . . , b + w − 1} ⊆ S for 0 ≤ i ≤ w. Therefore I0 , I1 , . . . , I2w+1 is an independent sequence with respect to S. Since I2w+1 = {b, b + 1, . . . , b + w}, the van Lint–
Wilson Bounding Technique implies that wt( f (x)) ≥ |I2w+1 | = w + 1.
Exercise 269 Why does Corollary 4.5.11 prove that the BCH Bound holds?
Example 4.5.12 Let C be the [17, 9, d] binary cyclic code with defining set T =
{1, 2, 4, 8, 9, 13, 15, 16}. In Example 4.5.7, we saw that the BCH Bound implies that
d ≥ 3, and the Hartmann–Tzeng Bound improves this to d ≥ 5 using A = {1, 2} and
B = {0, 7, 14}. The van Lint–Wilson Bounding Technique gives the same bound using
the following argument. Let f (x) ∈ C be a nonzero codeword of weight less than 5. T is
the 2-cyclotomic coset C1 . If f (α i ) = 0 for some i ∈ C3 , then f (x) is a nonzero codeword in the cyclic code with defining set C1 ∪ C3 , which is the repetition code, and hence
wt( f (x)) = 17, a contradiction. Letting S = {i ∈ N | f (α i ) = 0}, we assume that S has no
elements of C3 = {3, 5, 6, 7, 10, 11, 12, 14}. Then the following sequence of subsets of N
is independent with respect to S:
156
Cyclic codes
I0 = ∅,
I1 = I0 ∪ {6} = {6},
I2 = {−7} + I1 = {16} ⊆ S,
I3 = I2 ∪ {6} = {6, 16},
I4 = {−7} + I3 = {9, 16} ⊆ S,
I5 = I4 ∪ {6} = {6, 9, 16},
I6 = {−7} + I5 = {2, 9, 16} ⊆ S,
I7 = I6 ∪ {3} = {2, 3, 9, 16},
I8 = {−1} + I7 = {1, 2, 8, 15} ⊆ S,
I9 = I8 ∪ {3} = {1, 2, 3, 8, 15}.
Since wt( f (x)) ≥ |I9 | = 5, the van Lint–Wilson Bounding Technique shows that d ≥ 5. In
fact, d = 5 by [203].
This example shows the difficulty in applying the van Lint–Wilson Bounding Technique.
In this example, the Hartmann–Tzeng and van Lint–Wilson Bounding Technique give the
same bound. In this example, in order to apply the van Lint–Wilson Bounding Technique,
we construct an independent sequence whose sets are very closely related to the sets A and
B used in the Hartmann–Tzeng Bound. This construction mimics that used in the proof
of Corollary 4.5.11 where we showed that the BCH Bound follows from the van Lint–
Wilson Bounding Technique. If you generalize the construction in this example, you can
show that the Hartmann–Tzeng Bound also follows from the van Lint–Wilson Bounding
Technique.
Exercise 270 Prove that the Hartmann–Tzeng Bound follows from the van Lint–Wilson
Bounding Technique. Hint: See Example 4.5.12.
Exercise 271 Let C be the [21, 13, d] binary cyclic code with defining set T =
{3, 6, 7, 9, 12, 14, 15, 18}. Find a lower bound on the minimum weight of C using the
BCH Bound, the Hartmann–Tzeng Bound, and the van Lint–Wilson Bounding Technique.
Also using the van Lint–Wilson Bounding Technique, show that an odd weight codeword
has weight at least 7. Hint: For the latter, note that α 0 is not a root of an odd weight
codeword.
Exercise 272 Let C be the [41, 21, d] binary cyclic code with defining set T =
{3, 6, 7, 11, 12, 13, 14, 15, 17, 19, 22, 24, 26, 27, 28, 29, 30, 34, 35, 38}.
(a) Find the 2-cyclotomic cosets modulo 41.
(b) Let f (x) be a nonzero codeword and let S = {i ∈ N | f (α i ) = 0}. Show that if 1 ∈ S,
then f (x) is either 0 or 1 + x + · · · + x 40 .
(c) Assume that 1 ∈
/ S. Show that either S = T or S = T ∪ {0}.
(d) Now assume that f (x) has weight 8 or less. Show, by applying the rules for constructing independent sequences, that the following sets are part of an independent sequence
157
4.5 Minimum distance of cyclic codes
with respect to S: {28}, {12, 17}, {19, 22, 27}, {3, 6, 11, 30}, {11, 12, 14, 19, 38},
{6, 12, 26, 27, 29, 34}, {6, 7, 13, 27, 28, 30, 35}, and {1, 6, 7, 13, 27, 28, 30, 35}.
(e) From part (d), show that wt( f (x)) ≥ 8.
(f) If wt( f (x)) = 8, show that {0, 14, 15, 17, 22, 29, 34, 35} and {0, 1, 14, 15, 17, 22,
29, 34, 35} are independent with respect to S.
(g) Show that d ≥ 9. (In fact, d = 9 by [203].)
We conclude this section with the binary version of a result of McEliece [231] that shows
what powers of 2 are divisors of a binary cyclic code; we give only a partial proof as the
full proof is very difficult. Recall that is a divisor of a code provided every codeword has
weight a multiple of .
Theorem 4.5.13 (McEliece) Let C be a binary cyclic code with defining set T . Let a ≥ 2
be the smallest number of elements in N \ T , with repetitions allowed, that sum to 0. Then
2a−1 is a divisor of C but 2a is not.
Proof: We prove only the case a = 2 and part of the case a = 3.
If a = 2, then 0 ∈ T ; otherwise a = 1. Thus C has only even weight vectors by Exercise 238 proving that 2a−1 = 2 is a divisor of C. By definition of a, there is an element
b ∈ N \ T such that −b mod n ∈ N \ T . So b ∈ N \ (−1)T mod n; as 0 ∈ T , b = 0. By
Theorem 4.4.9 N \ (−1)T mod n is the defining set for C ⊥ . If 2a = 4 is a divisor of C,
by Theorem 1.4.8, C is self-orthogonal. Hence C ⊆ C ⊥ and by Corollary 4.2.5 (or Exercise 239), N \ (−1)T mod n ⊆ T . This is a contradiction as b ∈ N \ (−1)T mod n and
b ∈ N \ T.
If a = 3, we only prove that 2a−1 = 4 is a divisor of C. We show that N \ (−1)T mod n ⊆
T . Suppose not. Then there exists b ∈ N \ (−1)T mod n with b ∈ T . So −b mod n ∈
N \ T and b ∈ N \ T . Since b + (−b) = 0, a = 3, and this is a contradiction. So
N \ (−1)T mod n ⊆ T . Therefore C ⊆ C ⊥ by Theorem 4.4.9 and Corollary 4.2.5 (or Exercise 239). Thus C is a self-orthogonal binary cyclic code which must be doubly-even by
Theorem 4.4.18.
Example 4.5.14 Let C be the [31, 11, d] binary cyclic code with defining set T =
{1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 16, 17, 18, 20, 21, 22, 24, 26} that we considered in
Example 4.5.9. There we showed that d ≥ 9. The subcode C e of even weight vectors in C is a cyclic code with defining set Te = {0} ∪ T by Exercise 238. Notice that
N \ Te = {7, 14, 15, 19, 23, 25, 27, 28, 29, 30} and that the sum of any two elements in
this set is not 0 modulo 31, but 7 + 25 + 30 ≡ 0 (mod 31). So a = 3 in McEliece’s
Theorem applied to C e . Thus all vectors in C e , and hence all even weight vectors in C,
have weights a multiple of 4. In particular, C has no codewords of weight 10. Furthermore
/ T . Thus if C has a vector of weight w,
1 + x + · · · + x 30 = (x n − 1)/(x − 1) ∈ C as 0 ∈
it has a vector of weight 31 − w. Specifically, if C has a codeword of weight 9, it has a
codeword of weight 22, a contradiction. Thus we have d ≥ 11. The Griesmer Bound gives
d ≤ 12. By [203], d = 11.
Exercise 273 Let C be the [21, 9, d] binary cyclic code with defining set T = {0, 1, 2,
3, 4, 6, 7, 8, 11, 12, 14, 16}. Show that d ≥ 8.
158
Cyclic codes
4.6
Meggitt decoding of cyclic codes
In this section we present a technique for decoding cyclic codes called Meggitt decoding
[238, 239]. There are several variations of Meggitt decoding; we will present two of them.
Meggitt decoding is a special case of permutation decoding, which will be explored further
in Section 10.2. Permutation decoding itself is a special case of error trapping.
Let C be an [n, k, d] cyclic code over Fq with generator polynomial g(x) of degree
n − k; C will correct t = ⌊(d − 1)/2⌋ errors. Suppose that c(x) ∈ C is transmitted and
y(x) = c(x) + e(x) is received, where e(x) = e0 + e1 x + · · · + en−1 x n−1 is the error vector
with wt(e(x)) ≤ t. The Meggitt decoder stores syndromes of error patterns with coordinate
n − 1 in error. The two versions of the Meggitt Decoding Algorithm that we present can
briefly be described as follows. In the first version, by shifting y(x) at most n times, the
decoder finds the error vector e(x) from the list and corrects the errors. In the second
version, by shifting y(x) until an error appears in coordinate n − 1, the decoder finds the
error in that coordinate, corrects only that error, and then corrects errors in coordinates
n − 2, n − 3, . . . , 1, 0 in that order by further shifting. As you can see, Meggitt decoding
takes advantage of the cyclic nature of the code.
For any vector v(x) ∈ Fq [x], let Rg(x) (v(x)) be the unique remainder when v(x) is divided by g(x) according to the Division Algorithm; that is, Rg(x) (v(x)) = r (x), where
v(x) = g(x) f (x) + r (x) with r (x) = 0 or deg r (x) < n − k. The function Rg(x) satisfies
the following properties; the proofs are left as an exercise.
Theorem 4.6.1 With the preceding notation the following hold:
(i) Rg(x) (av(x) + bv ′ (x)) = a Rg(x) (v(x)) + b Rg(x) (v ′ (x)) for all v(x), v ′ (x) ∈ Fq [x] and
all a, b ∈ Fq .
(ii) Rg(x) (v(x) + a(x)(x n − 1)) = Rg(x) (v(x)).
(iii) Rg(x) (v(x)) = 0 if and only if v(x) mod (x n − 1) ∈ C.
(iv) If c(x) ∈ C, then Rg(x) (c(x) + e(x)) = Rg(x) (e(x)).
(v) If Rg(x) (e(x)) = Rg(x) (e′ (x)), where e(x) and e′ (x) each have weight at most t, then
e(x) = e′ (x).
(vi) Rg(x) (v(x)) = v(x) if deg v(x) < n − k.
Exercise 274 Prove Theorem 4.6.1.
Part (ii) of this theorem shows that we can apply Rg(x) to either elements of Rn or elements
of Fq [x] without ambiguity. We now need a theorem due to Meggitt that will simplify our
computations with Rg(x) .
Theorem 4.6.2 Let g(x) be a monic divisor of x n − 1 of degree n − k. If Rg(x) (v(x)) = s(x),
then
Rg(x) (xv(x) mod (x n − 1)) = Rg(x) (xs(x)) = xs(x) − g(x)sn−k−1 ,
where sn−k−1 is the coefficient of x n−k−1 in s(x).
159
4.6 Meggitt decoding of cyclic codes
n−k−1 i
Proof: By definition v(x) = g(x) f (x) + s(x), where s(x) = i=0
si x . So xv(x) =
xg(x) f (x) + xs(x) = xg(x) f (x) + g(x) f 1 (x) + s ′ (x), where s ′ (x) = Rg(x) (xs(x)). Also
xv(x) mod (x n − 1) = xv(x) − (x n − 1)vn−1 . Thus xv(x) mod (x n − 1) = xg(x) f (x) +
g(x) f 1 (x) + s ′ (x) − (x n − 1)vn−1 = (x f (x) + f 1 (x) − h(x)vn−1 )g(x) + s ′ (x), where
g(x)h(x) = x n − 1. Therefore Rg(x) (xv(x) mod (x n − 1)) = s ′ (x) = Rg(x) (xs(x)) because
n−k−1 i+1
si x , the
deg s ′ (x) < n − k. As g(x) is monic of degree n − k and xs(x) = i=0
remainder when xs(x) is divided by g(x) is xs(x) − g(x)sn−k−1 .
We now describe our first version of the Meggitt Decoding Algorithm and use an example
to illustrate each step. Define the syndrome polynomial S(v(x)) of any v(x) to be
S(v(x)) = Rg(x) (x n−k v(x)).
By Theorem 4.6.1(iii), if v(x) ∈ Rn , then S(v(x)) = 0 if and only if v(x) ∈ C.
Step I:
This is a one-time precomputation. Find all the syndrome polynomials S(e(x)) of error
n−1
ei x i such that wt(e(x)) ≤ t and en−1 = 0.
patterns e(x) = i=0
Example 4.6.3 Let C be the [15, 7, 5] binary cyclic code with defining set T =
{1, 2, 3, 4, 6, 8, 9, 12}. Let α be a 15th root of unity in F16 . Then g(x) = 1 + x 4 +
x 6 + x 7 + x 8 is the generator polynomial of C and the syndrome polynomial of e(x) is
S(e(x)) = Rg(x) (x 8 e(x)). Step I produces the following syndrome polynomials:
e(x)
S(e(x))
e(x)
S(e(x))
x 14
x + x 14
x 12 + x 14
x 11 + x 14
x 10 + x 14
x 9 + x 14
x 8 + x 14
x 7 + x 14
x7
x + x7
x5 + x7
x4 + x7
x3 + x7
x2 + x7
x + x7
1 + x7
x 6 + x 14
x 5 + x 14
x 4 + x 14
x 3 + x 14
x 2 + x 14
x + x 14
1 + x 14
x3 + x5 + x6
x + x4 + x5 + x6 + x7
x + x3 + x4 + x5 + x7
1 + x2 + x3 + x4 + x7
x + x2 + x5 + x6
1 + x + x4 + x5 + x6 + x7
1 + x4 + x6
13
6
2
The computations of these syndrome polynomials were aided by Theorems 4.6.1 and
4.6.2. For example, in computing the syndrome polynomial of x 12 + x 14 , we have S(x 12 +
x 14 ) = Rg(x) (x 8 (x 12 + x 14 )) = Rg(x) (x 5 + x 7 ) = x 5 + x 7 using Theorem 4.6.1(vi). In
computing the syndrome polynomial for 1 + x 14 , first observe that Rg(x) (x 8 ) = 1 + x 4 +
x 6 + x 7 ; then S(1 + x 14 ) = Rg(x) (x 8 (1 + x 14 )) = Rg(x) (x 8 ) + Rg(x) (x 7 ) = 1 + x 4 + x 6 .
We see by Theorem 4.6.2 that Rg(x) (x 9 ) = Rg(x) (x x 8 ) = Rg(x) (x(1 + x 4 + x 6 + x 7 )) =
Rg(x) (x + x 5 + x 7 ) + Rg(x) (x 8 ) = x + x 5 + x 7 + 1 + x 4 + x 6 + x 7 = 1 + x + x 4 + x 5 +
x 6 . Therefore in computing the syndrome polynomial for x + x 14 , we have S(x + x 14 ) =
Rg(x) (x 8 (x + x 14 )) = Rg(x) (x 9 ) + Rg(x) (x 7 ) = 1 + x + x 4 + x 5 + x 6 + x 7 . The others
follow similarly.
160
Cyclic codes
Exercise 275 Verify the syndrome polynomials found in Example 4.6.3.
Step II:
Suppose that y(x) is the received vector. Compute the syndrome polynomial S(y(x)) =
Rg(x) (x n−k y(x)). By Theorem 4.6.1(iv), S(y(x)) = S(e(x)), where y(x) = c(x) + e(x) with
c(x) ∈ C.
Example 4.6.4 Continuing with Example 4.6.3, suppose that y(x) = 1 + x 4 + x 7 + x 9 +
x 10 + x 12 is received. Then S(y(x)) = x + x 2 + x 6 + x 7 .
Exercise 276 Verify that S(y(x)) = x + x 2 + x 6 + x 7 in Example 4.6.4.
Step III:
If S(y(x)) is in the list computed in Step I, then you know the error polynomial e(x) and
this can be subtracted from y(x) to obtain the codeword c(x). If S(y(x)) is not in the list,
go on to Step IV.
Example 4.6.5 S(y(x)) from Example 4.6.4 is not in the list of syndrome polynomials
given in Example 4.6.3.
Step IV:
Compute the syndrome polynomial of x y(x), x 2 y(x), . . . in succession until the syndrome
polynomial is in the list from Step I. If S(x i y(x)) is in this list and is associated with the
error polynomial e′ (x), then the received vector is decoded as y(x) − x n−i e′ (x).
The computation in Step IV is most easily carried out using Theorem 4.6.2. As
n−k−1 i
si x ,
Rg(x) (x n−k y(x)) = S(y(x)) = i=0
S(x y(x)) = Rg(x) (x n−k x y(x)) = Rg(x) (x(x n−k y(x))) = Rg(x) (x S(y(x)))
= x S(y(x)) − sn−k−1 g(x).
(4.8)
We proceed in the same fashion to get the syndrome of x 2 y(x) from that of x y(x).
Example 4.6.6 Continuing with Example 4.6.4, we have, using (4.8) and S(y(x)) = x +
x 2 + x 6 + x 7 , that S(x y(x)) = x(x + x 2 + x 6 + x 7 ) − 1 · g(x) = 1 + x 2 + x 3 + x 4 + x 6 ,
which is not in the list in Example 4.6.3. Using (4.8), S(x 2 y(x)) = x(1 + x 2 + x 3 + x 4 +
x 6 ) − 0 · g(x) = x + x 3 + x 4 + x 5 + x 7 , which corresponds to the error x 4 + x 14 implying
that y(x) is decoded as y(x) − (x 2 + x 12 ) = 1 + x 2 + x 4 + x 7 + x 9 + x 10 . Note that this
is the codeword (1 + x 2 )g(x).
The Meggitt decoder can be implemented with shift-register circuits, and a second version
of the Meggitt Decoding Algorithm is often employed in order to simplify this circuit. In
this second implementation, the circuitry only corrects coordinate n − 1 of the vector in
the shift-register. The vector in the shift-register starts out as the received vector y(x); if
there is an error in position n − 1, it is corrected. Then the shift x y(x) is moved into the
shift-register. If there is an error in coordinate n − 1, it is corrected; this in effect corrects
coordinate n − 2 of y(x). This continues until y(x) and its n − 1 shifts have been moved
into the shift-register and have been examined. At each stage only the error at the end of
the shift-register is corrected. This process corrects the original received vector y(x) since
161
4.6 Meggitt decoding of cyclic codes
a correction in coordinate n − 1 of x i y(x) produces a correction in coordinate n − 1 − i
of y(x). This version of Meggitt decoding allows the use of shift-registers whose internal
stages do not need to be directly accessed and modified and allows for fewer wires in the
circuit. For those interested in the circuit designs, consult [21]. We illustrate this in the
binary case by continuing with Example 4.6.6.
Example 4.6.7 After computing the syndrome polynomials of y(x), x y(x), and x 2 y(x),
we conclude there is an error in coordinate 14 of x 2 y(x), that is, coordinate 12 of
y(x). Thus we modify x 2 y(x) in coordinate 14 to obtain x 2 y ′ (x) = x 2 + x 6 + x 9 + x 11 +
x 12 (where y ′ (x) = 1 + x 4 + x 7 + x 9 + x 10 is y(x) corrected in coordinate 12). This
changes the syndrome as well. Fortunately, the change is simple to deal with as follows. S(x 2 y ′ (x)) = Rg(x) (x 8 (x 2 y ′ (x))) = Rg(x) (x 8 (x 2 y(x))) − Rg(x) (x 8 x 14 ) = S(x 2 y(x)) −
Rg(x) (x 22 ) because x 2 y ′ (x) = x 2 y(x) − x 14 since we changed only coordinate 14 of x 2 y(x).
But S(x 2 y(x)) − Rg(x) (x 22 ) = x + x 3 + x 4 + x 5 + x 7 − x 7 = x + x 3 + x 4 + x 5 from Example 4.6.6 as Rg(x) (x 22 ) = Rg(x) (x 7 ) = x 7 . Thus to get the new syndrome for x 2 y ′ (x), we
take S(x 2 y(x)) and subtract x 7 . This holds in general: namely, to obtain the new syndrome
polynomial after an error has been corrected, take the old syndrome and subtract x 7 . This
simple modification in the syndrome polynomial is precisely why the definition of syndrome S(v(x)) as Rg(x) (x n−k v(x)) was given; had we used the more natural Rg(x) (v(x)) as
the definition of the syndrome polynomial of v(x), we would have had a more complicated
modification of the syndrome at this juncture. So we compute and obtain the following
table:
Syndrome of
Syndrome
Syndrome of
Syndrome
x 3 y ′ (x)
x 4 y ′ (x)
x 5 y ′ (x)
x 6 v ′ (x)
x 7 y ′ (x)
x2 + x4 + x5 + x6
x3 + x5 + x6 + x7
1
x
x2
x 8 y ′ (x)
x 9 y ′ (x)
x 10 y ′ (x)
x 11 y ′ (x)
x 12 y ′ (x)
x3
x4
x5
x6
x7
We see that none of our syndromes is in the list from Step I until that of x 12 y ′ (x). Hence
we change coordinate 14 of x 12 y ′ (x) = x + x 4 + x 6 + x 7 + x 12 to obtain x 12 y ′′ (x) = x +
x 4 + x 6 + x 7 + x 12 + x 14 (where y ′′ (x) = 1 + x 2 + x 4 + x 7 + x 9 + x 10 is y ′ (x) changed
in coordinate 2). As above, the syndrome of x 12 y ′′ (x) is x 7 − x 7 = 0. We could stop here, if
our circuit is designed to check for the zero syndrome polynomial. Otherwise we compute
the syndromes of x 13 y ′′ (x) and x 14 y ′′ (x), which are both 0 and not on the list, indicating
that the procedure should halt as we have considered all cyclic shifts of our received vector.
So the correct vector is y ′′ (x) and in fact turns out to be the vector output of the shift-register
circuit used in this version of Meggitt decoding.
Exercise 277 Verify the results in Example 4.6.7.
In the binary case where we correct only the high-order errors one at a time, it is unnecessary to store the error polynomial corresponding to each syndrome. Obviously the
speed of our decoding algorithm depends on the size of the list and the speed of the pattern
162
Cyclic codes
recognition of the circuit. There are further variations of Meggitt decoding for which we
again refer the reader to [21].
Exercise 278 With the same code as in Example 4.6.3, find the codeword sent if two or
fewer errors have occurred and y(x) = 1 + x + x 6 + x 9 + x 11 + x 12 + x 13 is received. Do
this in two ways:
(a) Carry out Steps I–IV (that is, the first version of the Meggitt Decoding Algorithm).
(b) Carry out Meggitt decoding by correcting only the high-order bits and shifting as in
Example 4.6.7 (that is, the second version of the Meggitt Decoding Algorithm).
4.7
Affine-invariant codes
In this section we will look at extending certain cyclic codes and examine an important class
of codes called affine-invariant codes. Reed–Muller codes and some BCH codes, defined
in Chapter 5, are affine-invariant.
We will present a new setting for “primitive” cyclic codes that will assist us in the
description of affine-invariant codes. A primitive cyclic code over Fq is a cyclic code of
length n = q t − 1 for some t.1 To proceed, we need some notation.
Let I denote the field of order q t , which is then an extension field of Fq . The set I will
be the index set of our extended cyclic codes of length q t . Let I ∗ be the nonzero elements
of I, and suppose α is a primitive nth root of unity in I (and hence a primitive element of
I ). The set I ∗ will be the index set of our primitive cyclic codes of length n = q t − 1. With
X an indeterminate, let
1
ag X g | ag ∈ Fq for all g ∈ I .
Fq [I] = a =
g∈I
The set Fq [I] is actually an algebra under the operations
c
g∈I
ag X g + d
g∈I
bg X g =
for c, d ∈ Fq , and
ag X g
g∈I
g∈I
bg X g =
g∈I
g∈I
(cag + dbg )X g
h∈I
ah bg−h X g .
The zero and unity of Fq [I] are g∈I 0X g and X 0 , respectively. This is the group algebra
of the additive group of I over Fq . Let
1
∗
g
∗
ag X | ag ∈ Fq for all g ∈ I .
Fq [I ] = a =
g∈I ∗
1
Cyclic codes of length n = q t − 1 over an extension field of Fq are also called primitive, but we will only study
the more restricted case.
163
4.7 Affine-invariant codes
Fq [I ∗ ] is a subspace of Fq [I] but not a subalgebra. So elements of Fq [I ∗ ] are of the form
n−1
i
aα i X α ,
i=0
while elements of Fq [I] are of the form
n−1
a0 X 0 +
i
aα i X α .
i=0
The vector space Fq [I ∗ ] will be the new setting for primitive cyclic codes, and the algebra
Fq [I] will be the setting for the extended cyclic codes. So in fact both codes are contained
in Fq [I], which makes the discussion of affine-invariant codes more tractable.
Suppose that C is a cyclic code over Fq of length n = q t − 1. The coordinates of C have
been denoted {0, 1, . . . , n − 1}. In Rn , the ith component ci of a codeword c = c0 c1 · · · cn−1 ,
with associated polynomial c(x), is the coefficient of the term ci x i in c(x); the component
ci is kept in position x i . Now we associate c with an element C(X ) ∈ Fq [I ∗ ] as follows:
n−1
c ↔ C(X ) =
i
i=0
Cαi X α =
Cg X g ,
(4.9)
g∈I ∗
i
where Cαi = ci . Thus the ith component of c is the coefficient of the term Cαi X α in C(X );
i
the component ci is kept in position X α .
Example 4.7.1 Consider the element c(x) = 1 + x + x 3 in R7 over F2 . So n = 7 = 23 − 1.
Let α be a primitive element of F8 . Then c0 = Cα0 = 1, c1 = Cα1 = 1, and c3 = Cα3 = 1,
with the other ci = Cαi = 0. So
3
c(x) = 1 + x + x 3 ↔ C(X ) = X + X α + X α .
We now need to examine the cyclic shift xc(x) under the correspondence (4.9). We have
n−1
xc(x) = cn−1 +
i=1
n−1
ci−1 x i ↔
n−1
i
i=0
Cαi−1 X α =
i
Cαi X αα .
i=0
Example 4.7.2 We continue with Example 4.7.1. Namely,
2
4
3
xc(x) = x + x 2 + x 4 ↔ X α + X α + X α = X α1 + X αα + X αα .
In our new notation a primitive cyclic code over Fq of length n = q t − 1 is any subset C
of Fq [I ∗ ] such that
n−1
i
i=0
Cαi X α =
g∈I ∗
Cg X g ∈ C
if and only if
n−1
i
i=0
Cαi X αα =
g∈I ∗
C g X αg ∈ C.
(4.10)
164
Cyclic codes
The coordinates of our cyclic codes are indexed by I ∗ . We are now ready to extend our
cyclic codes. We use the element 0 ∈ I to index the extended coordinate. The extended
g
)=
codeword of C(X ) = g∈I ∗ C g X g is C(X
g∈I C g X such that
g∈I C g = 0.
3
Example 4.7.3 We continue with Examples 4.7.1 and 4.7.2. If C(X ) = X + X α + X α
) = X 0 + X + X α + X α3 = 1 + X + X α + X α3 .
then C(X
From (4.10), together with the observation that X α0 = X 0 = 1, in this terminology an
extended cyclic code is a subspace
C of Fq [I] such that
g∈I
Cg X g ∈
C
if and only if
g∈I
C g X αg ∈
C and
g∈I
C g = 0.
With this new notation we want to see where the concepts of zeros and defining sets
= {s | 0 ≤ s ≤ n}.
come in. This can be done with the assistance of a function φs . Let N
define φs : Fq [I] → I by
For s ∈ N
g
φs
Cg X =
Cg gs ,
g∈I
g∈I
)) =
where by convention 00 = 1 in I. Thus φ0 (C(X
g∈I C g implying that C(X ) is the
)) = 0. In particular, if C is extended
extended codeword of C(X ) if and only if φ0 (C(X
)) = 0 for all C(X
) ∈
)) when 1 ≤ s ≤ n − 1? As
cyclic, then φ0 (C(X
C. What is φs (C(X
s
0 = 0 in I,
n−1
)) =
φs (C(X
i=0
n−1
Cαi (α i )s =
i=0
n−1
Cαi (α s )i =
i=0
ci (α s )i = c(α s ),
(4.11)
where c(x) is the polynomial in Rn associated to C(X ) in Fq [I ∗ ]. Suppose our original
code C defined in Rn had defining set T relative to the nth root of unity α. Then (4.11)
)) = 0 for all C(X
) ∈
C. Finally,
shows that if 1 ≤ s ≤ n − 1, s ∈ T if and only if φs (C(X
what is φn (C(X ))? Equation (4.11) works in this case as well, implying that α n = 1 is a
)) = 0 for all C(X
) ∈
zero of C if and only if φn (C(X
C. But α 0 = α n = 1. Hence we have,
)) = 0 for all C(X
) ∈
C. We can now
by Exercise 238, that 0 ∈ T if and only if φn (C(X
describe an extended cyclic code in terms of a defining set as follows: a code
C of length q t
is an extended cyclic code with defining set T provided T ⊆ N is a union of q-cyclotomic
cosets modulo n = q t − 1 with 0 ∈
T and
) ∈ Fq [I] | φs (C(X
)) = 0 for all s ∈
C = {C(X
T }.
(4.12)
We make several remarks.
r 0 and n are distinct in
T and each is its own q-cyclotomic coset.
r 0∈
T because we need all codewords in
C to be even-like.
r If n ∈
)) =
T , then φn (C(X
C
=
0.
As the extended codeword is even-like, since
∗
g
g∈I
0 ∈ T , g∈I C g = 0. Thus the extended coordinate is always 0, a condition that makes
the extension trivial.
r The defining set T of C and the defining set
T are closely related:
T is obtained by taking
T , changing 0 to n if 0 ∈ T , and adding 0.
165
4.7 Affine-invariant codes
r s∈
T with 1 ≤ s ≤ n if and only if α s is a zero of C.
r By employing the natural ordering 0, α n , α 1 , . . . , α n−1 on I, we can make the extended
cyclic nature of these codes apparent. The coordinate labeled 0 is the parity check coordinate, and puncturing the code
C on this coordinate gives the code C that admits the
i
i
standard cyclic shift α → αα as an automorphism.
r To check that a code C in R is cyclic with defining set T , one only needs to verify that
n
C in
c(x) ∈ C if and only if c(α s ) = 0 for all s ∈ T . Parallel to that, to check that a code
) ∈
Fq [I] is extended cyclic with defining set
T one only needs to verify that C(X
C if
and only if φs (C(X )) = 0 for all s ∈ T .
Example 4.7.4 Let C be the [7, 4] binary cyclic code with defining set T = {1, 2, 4} and
+
generator polynomial g(x) = i∈T (x − α i ) = 1 + x + x 3 , where α is a primitive element
of F8 satisfying α 3 = 1 + α. See Examples 3.4.3 and 4.3.4. The extended generator is
3
C is
1 + X + X α + X α . Then a generator matrix for
1 X
1 1
1 0
1 0
1 0
Xα
1
1
0
0
Xα
0
1
1
0
2
Xα
1
0
1
1
3
Xα
0
1
0
1
4
Xα
0
0
1
0
5
Xα
0
0
0
1
6
.
3 . It has defining set
Notice that this is the extended Hamming code H
T = {0, 1, 2, 4}.
In Section 1.6, we describe how permutation automorphisms act on codes. From the
discussion in that section, a permutation σ of I acts on
C as follows:
C g X gσ .
Cg X g σ =
g∈I
g∈I
We define the affine group GA1 (I) by GA1 (I) = {σa,b | a ∈ I ∗ ,b ∈ I}, where gσa,b =
ag + b. Notice that the maps σa,0 are merely the cyclic shifts on the coordinates
{α n , α 1 , . . . , α n−1 } each fixing the coordinate 0. The group GA1 (I) has order (n + 1)n =
C over Fq such that
q t (q t − 1). An affine-invariant code is an extended cyclic code
GA1 (I) ⊆ PAut(C). Amazingly, we can easily decide which extended cyclic codes are
affine-invariant by examining their defining sets. In order to do this we introduce a partial
. Suppose that q = p m , where p is a prime. Then N
= {0, 1, . . . , n},
ordering ' on N
t
can be written in its p-adic
where n = q − 1 = p mt − 1. So every element s ∈ N
expansion
mt−1
s=
si pi ,
i=0
where 0 ≤ si < p for 0 ≤ i < mt.
mt−1
We say that r ' s provided ri ≤ si for all 0 ≤ i < mt, where r = i=0
ri pi is the p-adic
expansion of r . Notice that if r ' s, then in particular r ≤ s. We also need a result called
Lucas’ Theorem [209], a proof of which can be found in [18].
166
Cyclic codes
Theorem 4.7.5 (Lucas) Let r =
of r and s. Then
mt−1
si
s
≡
(mod p).
r
ri
i=0
mt−1
i=0
ri pi and s =
mt−1
i=0
si pi be the p-adic expansions
We are now ready to determine the affine-invariant codes from their defining sets, a result
due to Kasami, Lin, and Peterson [162].
T . The code
Theorem 4.7.6 Let
C be an extended cyclic code of length q t with defining set
with r ' s.
C is affine-invariant if and only if whenever s ∈
T then r ∈
T for all r ∈ N
g
)=
Proof: Suppose that C(X
g∈I C g X ∈ C. Let s ∈ N and a, b ∈ I with a = 0. So
ag+b
(C(X ))σa,b = g∈I C g X
. Therefore,
s
s
s
φs ((C(X ))σa,b ) =
Cg
C g (ag + b) =
(ag)r bs−r .
r
r =0
g∈I
g∈I
By Lucas’ Theorem, ( rs ) is nonzero modulo p if and only if ri ≤ si for all 0 ≤ i < mt where
mt−1
mt−1
si pi are the p-adic expansions of r and s. Hence
ri pi and s = i=0
r = i=0
s r s−r
s
))σa,b ) =
C g gr .
Cg
φs ((C(X
a b
(ag)r bs−r =
r
r
r 's
r 's
g∈I
g∈I
Therefore,
))σa,b ) =
φs ((C(X
r 's
s r s−r
a b φr (C(X )).
r
(4.13)
Let s be an arbitrary element of
T and assume that if r ' s, then r ∈
T . By (4.12),
)σa,b ) = 0. As s was an arbitrary
φr (C(X )) = 0 as r ∈ T , and therefore by (4.13), φs (C(X
element of
T , by (4.12),
C is affine-invariant.
Conversely, assume that
C is affine-invariant. Assume that s ∈
T and that r ' s.
)) = 0 by (4.12). As
C is affine-invariant,
We need to show that r ∈
T , that is φr (C(X
)σa,b ) = 0 for all a ∈ I ∗ and b ∈ I. In particular this holds for a = 1; letting
φs (C(X
a = 1 in (4.13) yields
s
))bs−r
φr (C(X
0=
r
r 's
for all b ∈ I. But the right-hand side of this equation is a polynomial in b of degree at
most s < q t with all q t possible b ∈ I as roots. Hence this must be the zero polynomial. So
)) = 0 in I for all r ' s. However, by Lucas’ Theorem again, ( s ) ≡ 0 (mod p)
( rs )φr (C(X
r
)) = 0 implying that
and thus these binomial coefficients are nonzero in I. Hence φr (C(X
r ∈
T.
Example 4.7.7 Suppose that
C is an affine-invariant code with defining set
T such that
. Thus
and
n∈
T . By Exercise 279, r ' n for all r ∈ N
T =N
C is the zero code. This
makes sense because if n ∈
T , then the extended component of any codeword is always
167
4.7 Affine-invariant codes
0. Since the code is affine-invariant and the group GA1 (I) is transitive (actually doublytransitive), every component of any codeword must be zero. Thus
C is the zero code.
.
Exercise 279 Prove that r ' n for all r ∈ N
Example 4.7.8 The following table gives the defining sets for the binary extended cyclic
codes of length 8. The ones that are affine-invariant are marked.
Defining set
Affine-invariant
{0}
{0, 1, 2, 4}
{0, 1, 2, 3, 4, 5, 6}
{0, 1, 2, 4, 7}
{0, 1, 2, 3, 4, 5, 6, 7}
{0, 3, 5, 6}
{0, 3, 5, 6, 7}
{0, 7}
yes
yes
yes
no
yes
no
no
no
Notice the rather interesting phenomenon that the extended cyclic codes with defining
sets {0, 1, 2, 4} and {0, 3, 5, 6} are equivalent, yet one is affine-invariant while the other is
not. The equivalence of these codes follows from Exercise 41 together with the fact that
the punctured codes of length 7 are cyclic with defining sets {1, 2, 4} and {3, 5, 6} and
are equivalent under the multiplier µ3 . These two extended codes have isomorphic automorphism groups. The automorphism group of the affine-invariant one contains GA1 (F8 )
while the other contains a subgroup isomorphic, but not equal, to GA1 (F8 ). The code with
defining set {0} is the [8, 7, 2] code consisting of the even weight vectors in F82 . The code
with defining set {0, 1, 2, 3, 4, 5, 6} is the repetition code, and the code with defining set
{0, 1, 2, 3, 4, 5, 6, 7} is the zero code. Finally, the code with defining set {0, 1, 2, 4} is one
3.
particular form of the extended Hamming code H
Exercise 280 Verify all the claims in Example 4.7.8.
Exercise 281 Give the defining sets of all binary affine-invariant codes of length 16.
Exercise 282 Give the defining sets of all affine-invariant codes over F4 of length 16.
Exercise 283 Give the defining sets of all affine-invariant codes over F3 of length 9.
Corollary 4.7.9 If C is a primitive cyclic code such that
C is a nonzero affine-invariant code,
then the minimum weight of C is its minimum odd-like weight. In particular, the minimum
weight of a binary primitive cyclic code, whose extension is affine-invariant, is odd.
Proof: As GA1 (I) is transitive, the result follows from Theorem 1.6.6.
5
BCH and Reed–Solomon codes
In this chapter we examine one of the many important families of cyclic codes known as
BCH codes together with a subfamily of these codes called Reed–Solomon codes.
The binary BCH codes were discovered around 1960 by Hocquenghem [132] and independently by Bose and Ray-Chaudhuri [28, 29], and were generalized to all finite fields
by Gorenstein and Zierler [109]. At about the same time as BCH codes appeared in the
literature, Reed and Solomon [294] published their work on the codes that now bear their
names. These codes, which can be described as special BCH codes, were actually first
constructed by Bush [42] in 1952 in the context of orthogonal arrays. Because of their burst
error-correction capabilities, Reed–Solomon codes are used to improve the reliability of
compact discs, digital audio tapes, and other data storage systems.
5.1
BCH codes
BCH codes are cyclic codes designed to take advantage of the BCH Bound. We would like to
construct a cyclic code C of length n over Fq with simultaneously high minimum weight and
high dimension. Having high minimum weight, by the BCH Bound, can be accomplished
by choosing a defining set T for C with a large number of consecutive elements. Since the
dimension of C is n − |T | by Theorem 4.4.2, we would like |T | to be as small as possible.
So if we would like C to have minimum distance at least δ, we can choose a defining set as
small as possible that is a union of q-cyclotomic cosets with δ − 1 consecutive elements.
Let δ be an integer with 2 ≤ δ ≤ n. A BCH code C over Fq of length n and designed
distance δ is a cyclic code with defining set
T = Cb ∪ Cb+1 ∪ · · · ∪ Cb+δ−2 ,
(5.1)
where Ci is the q-cyclotomic coset modulo n containing i. By the BCH Bound this code
has minimum distance at least δ.
Theorem 5.1.1 A BCH code of designed distance δ has minimum weight at least δ.
Proof: The defining set (5.1) contains δ − 1 consecutive elements. The result follows by
the BCH Bound.
Varying the value of b produces a variety of codes with possibly different minimum
distances and dimensions. When b = 1, C is called a narrow-sense BCH code. As with any
cyclic code, if n = q t − 1, then C is called a primitive BCH code.
169
Y
L
5.1 BCH codes
F
T
m
a
e
BCH codes are nested in the following sense.
Theorem 5.1.2 For i = 1 and 2, let C i be the BCH code over Fq with defining set Ti =
Cb ∪ Cb+1 ∪ · · · ∪ Cb+δi −2 , where δ1 < δ2 . Then C 2 ⊆ C 1 .
Exercise 284 Prove Theorem 5.1.2.
Example 5.1.3 We construct several BCH codes over F3 of length 13. The 3-cyclotomic
cosets modulo 13 are
C0 = {0}, C1 = {1, 3, 9}, C2 = {2, 5, 6}, C4 = {4, 10, 12}, C7 = {7, 8, 11}.
As ord13 (3) = 3, x 13 − 1 has its roots in F33 . There is a primitive element α in F33 which
satisfies α 3 + 2α + 1 = 0. Then β = α 2 is a primitive 13th root of unity in F33 . Using β,
the narrow-sense BCH code C 1 of designed distance 2 has defining set C1 and generator
polynomial g1 (x) = 2 + x + x 2 + x 3 . By Theorem 5.1.1, the minimum distance is at least
2. However, C 1 µ2 , which is equivalent to C 1 , is the (non-narrow-sense) BCH code with
defining set 2−1 C1 = C7 = C8 by Corollary 4.4.5. This code has designed distance 3 and
generator polynomial g7 (x) = 2 + 2x + x 3 . Thus C 1 is a [13, 10, 3] BCH code. The evenlike subcode C 1,e of C 1 is the BCH code with defining set C0 ∪ C1 . C 1,e has designed distance
3 and minimum distance 3 as (x − 1)g1 (x) = 1 + x + x 4 is even-like of weight 3. Note that
the even-like subcode of C 1 µ2 is equivalent to C 1,e but is not BCH. The narrow-sense
BCH code C 2 of designed distance 3 has defining set C1 ∪ C2 . As this defining set also
equals C1 ∪ C2 ∪ C3 , C 2 also has designed distance 4. Its generator polynomial is g1,2 (x) =
1 + 2x + x 2 + 2x 3 + 2x 4 + 2x 5 + x 6 , and (1 + x)g1,2 (x) = 1 + x 4 + x 5 + x 7 has weight
4. Thus C 2 is a [13, 7, 4] BCH code. Finally, the narrow-sense BCH code C 3 of designed
distance 5 has defining set C1 ∪ C2 ∪ C3 ∪ C4 ; this code is also the narrow-sense BCH code
of designed distance 7. C 3 has generator polynomial 2 + 2x 2 + 2x 3 + x 5 + 2x 7 + x 8 + x 9 ,
which has weight 7; thus C 3 is a [13, 4, 7] BCH code. Notice that C 3 ⊂ C 2 ⊂ C 1 by the
nesting property of Theorem 5.1.2.
Exercise 285 This exercise continues Example 5.1.3. The minimal polynomials of β, β 2 ,
β 4 , and β 7 are g1 (x) = 2 + x + x 2 + x 3 , g2 (x) = 2 + x 2 + x 3 , g4 (x) = 2 + 2x + 2x 2 +
x 3 , and g7 (x) = 2 + 2x + x 3 , respectively. Find generator polynomials, designed distances,
and minimum distances of all BCH codes over F3 of length 13. Note that computations will
be reduced if multipliers are used to find some equivalences between BCH codes.
The next theorem shows that many Hamming codes are narrow-sense BCH codes.
Theorem 5.1.4 Let n = (q r − 1)/(q − 1) where gcd(r, q − 1) = 1. Let C be the narrowsense BCH code with defining set T = C1 . Then C is the Hamming code Hq,r .
Proof: Let γ be a primitive element of Fq r . The code C is generated by Mα (x),
where α = γ q−1 is a primitive nth root of unity. An easy calculation shows that n =
−1 r −1−i
iq
+ r . So gcd(n, q − 1) = gcd(r, q − 1) = 1. By Theorem 3.5.3, the
(q − 1) ri=1
nonzero elements of Fq in Fq r are powers of γ where the power is a divisor of
n = (q r − 1)/(q − 1). As gcd(n, q − 1) = 1, the only element of Fq that is a power of
α is the identity. Therefore if we write the elements of Fq r as r -tuples in Frq , none of the
170
BCH and Reed–Solomon codes
r -tuples corresponding to α 0 , α 1 , . . . , α n−1 are multiples of one another using only elements
of Fq . This implies that these elements are the distinct points of PG(r − 1, q). The r × n
matrix H , whose columns are the r -tuples corresponding to α 0 , α 1 , . . . , α n−1 , is the parity
check matrix Hq,r of Hq,r as given in Section 1.8.
Corollary 5.1.5 Every binary Hamming code is a primitive narrow-sense BCH code.
Exercise 286 Verify the claim made in the proof of Theorem 5.1.4 that n = (q − 1) ×
r −1 r −1−i
+ r , where n = (q r − 1)/(q − 1).
i=1 iq
Exercise 287 Prove Corollary 5.1.5.
But not every Hamming code is equivalent to a BCH code. Indeed as the next example
shows, some Hamming codes are not equivalent to any cyclic code.
Example 5.1.6 In Example 1.3.3, a generator matrix for the [4, 2, 3] ternary Hamming
code C = H3,2 , also called the tetracode, was presented. In Example 1.7.7, its monomial
automorphism group was given; namely MAut(C) = ŴAut(C) is a group of order 48 generated by diag(1, 1, 1, −1)(1, 2, 3, 4) and diag(1, 1, 1, −1)(1, 2), where MAutPr (C) = Sym4 .
Using this fact a straightforward argument, which we leave to Exercise 288, shows that a
monomial map M, such that C M is cyclic, does not exist.
Exercise 288 Verify that in Example 5.1.6 a monomial map M, such that C M is cyclic,
does not exist.
The Hamming codes of Theorem 5.1.4 have designed distance δ = 2, yet their actual
minimum distance is 3. In the binary case this can be explained as follows. These Hamming
codes are the narrow-sense BCH codes of designed distance δ = 2 with defining set T = C1 .
But in the binary case, C1 = C2 and so T is also the defining set of the narrow-sense BCH
code of designed distance δ = 3. This same argument can be used with every narrow-sense
binary BCH code in that the designed distance can always be assumed to be odd. In the next
theorem we give a lower bound on the dimension of a BCH code in terms of δ and ordn (q).
Of course, the exact dimension is determined by the size of the defining set.
Theorem 5.1.7 Let C be an [n, k] BCH code over Fq of designed distance δ. The following
hold:
(i) k ≥ n − ordn (q)(δ − 1).
(ii) If q = 2 and C is a narrow-sense BCH code, then δ can be assumed to be odd; furthermore if δ = 2w + 1, then k ≥ n − ordn (q)w.
Proof: By Theorem 4.1.4 each q-cyclotomic coset has size a divisor of ordn (q). The defining
set for a BCH code of designed distance δ is the union of at most δ − 1 q-cyclotomic cosets
each of size at most ordn (q). Hence the dimension of the code is at least n − ordn (q)(δ − 1),
giving (i). If the code is narrow-sense and binary, then {1, 2, . . . , δ − 1} ⊆ T . Suppose that
δ is even. Then δ ∈ Cδ/2 ⊆ T , implying that T contains the set {1, 2, . . . , δ} of δ consecutive
elements. Hence we can increase the designed distance by 1 whenever the designed distance
171
5.1 BCH codes
is assumed to be even. So we may assume that δ is odd. If δ = 2w + 1, then
T = C 1 ∪ C2 ∪ · · · ∪ C2w = C 1 ∪ C3 ∪ · · · ∪ C2w−1 ,
as C 2i = C i . Hence T is the union of at most w q-cyclotomic cosets of size at most ordn (q),
yielding k ≥ n − ordn (q)w.
As we see in the proof of Theorem 5.1.7, it is possible for more than one value of δ to be
used to construct the same BCH code. The binary Golay code provides a further example.
Example 5.1.8 In Example 4.5.5, we saw that the [23, 12, 7] binary Golay code is a cyclic
code with defining set T = C1 . Thus it is a narrow-sense BCH code with designed distance
δ = 2. As C1 = C2 = C3 = C4 , it is also a BCH code with designed distance any of 3, 4,
or 5.
Because of examples such as this, we call the largest designed distance δ ′ defining a BCH
code C the Bose distance of C. Thus we have d ≥ δ ′ ≥ δ, where d is the actual minimum
distance of C by Theorem 5.1.1. For the Golay code of Example 5.1.8, the Bose distance is
5; notice that the true minimum distance is 7, which is still greater than the Bose distance. As
we saw in Chapter 4, there are techniques to produce lower bounds on the minimum distance
which, when applied to BCH codes, may produce lower bounds above the Bose distance.
Determining the actual minimum distance of specific BCH codes is an important, but
difficult, problem. Section 3 of [50] discusses this problem extensively. Tables of minimum
distances for very long codes have been produced. For example, Table 2 of [50] contains
a list of primitive narrow-sense binary BCH codes whose minimum distance is the Bose
distance. As an illustration of the lengths of codes involved, the primitive narrow-sense
binary BCH code of length n = 24199 − 1 and designed distance 49 actually has minimum
distance 49, as does every such code of length n = 24199k − 1.
For narrow-sense BCH codes, the BCH Bound has been very good in general. In fact,
that is how the minimum weights of many of the BCH codes of high dimension have been
determined: find a vector in the code whose weight is the Bose designed distance. There
has been a great deal of effort to find the true minimum distance of all primitive narrowsense binary BCH codes of a fixed length. This has been done completely for lengths 3,
7, 31, 63, 127, and 255; for lengths up to 127 see Figure 9.1 of [218] and Table 2 of [50]
and for length 255 see [10]. The true minimum distance of all but six codes of length
511 has been found in [10] and [46]; see also [50]. Of the 51 codes of length 511 whose
minimum distance is known, 46 have minimum distance equal to their Bose distance, four
have minimum distance equal to two more than their Bose distance, and one has minimum
distance four more than its Bose distance (the [511, 103, 127] code of designed distance
123). The following conjecture has been formulated by Charpin.
Conjecture Every binary primitive narrow-sense BCH code of Bose designed distance δ
has minimum distance no more than δ + 4.
If the code is not primitive, some codes fail to satisfy this conjecture. For example, the
binary narrow-sense BCH code of length 33 and designed distance 5 has minimum distance
10 (see [203]).
172
BCH and Reed–Solomon codes
As we see, finding the minimum distance for a specific code or a family of codes, such
as BCH codes, has been an important area of research. In this connection, it would be
very useful to have improved algorithms for accomplishing this. There is also interest in
determining as much as one can about the weight distribution or automorphism group of
a specific code or a family of codes. For example, in Theorem 5.1.9 we will show that
extended narrow-sense primitive BCH codes are affine-invariant; the full automorphism
groups of these codes are known [16].
By Theorem 4.2.6 and Exercises 238 and 243, the dual, even-like subcode, and cyclic
complement of a cyclic code C are all cyclic. If C is a BCH code, are its dual, even-like
subcode, and cyclic complement also BCH? In general the answer is no to each of these,
although in certain cases some of these are BCH. For example, suppose that C is a narrowsense BCH code of length n and designed distance δ. Then as 1 + δ − 2 ≤ n − 1, 0 is not in
the defining set T of C, and hence, by Exercise 238, C is odd-like. Thus T = C1 ∪ C2 ∪ · · ·
∪ Cδ−1 while the defining set of the even-like subcode C e is T ∪ {0} = C0 ∪ C1 ∪ · · ·
∪ Cδ−1 ; hence C e is a BCH code of designed distance δ + 1.
Exercise 289 Let 2 ≤ δ ≤ 15.
(a) Give the defining set of all binary BCH codes of designed distance δ and length 15.
(b) What is the Bose distance of each of the codes in part (a)?
(c) What is the defining set of each of the duals of the codes in part (a)? Which of these
duals are BCH?
(d) What is the defining set of each of the even-like subcodes of the codes in part (a)? Which
of these even-like subcodes are BCH?
(e) What is the defining set of each of the cyclic complements of the codes in part (a)?
Which of these cyclic complements are BCH?
(f ) Find the defining sets of all binary cyclic codes of length 15 that are not BCH.
(g) Find the minimum weight of the BCH codes of length 15.
Let C be a primitive narrow-sense BCH code of length n = q t − 1 over Fq with designed
C has
distance δ. The defining set T is C1 ∪ C2 ∪ · · · ∪ Cδ−1 . The extended BCH code
defining set
T = {0} ∪ T . The reader is asked to show in Exercise 290 that if s ∈
T and
= {0, 1, . . . , n} of Section 4.7. By
r ' s, then r ∈
T where ' is the partial order on N
Theorem 4.7.6, C is affine-invariant, and by Corollary 4.7.9 the minimum weight of C is its
minimum odd-like weight. So we have the following theorem.
Theorem 5.1.9 Let C be a primitive narrow-sense BCH code of length n = q t − 1 over Fq
with designed distance δ. Then
C is affine-invariant, and the minimum weight of C is its
minimum odd-like weight.
Exercise 290 Let
C be an extended BCH code of length q t over Fq with defining set
T = {0} ∪ C1 ∪ C2 ∪ · · · ∪ Cδ−1 . Let q = p m , where p is a prime, and let n = q t − 1 =
p mt − 1. Show, by carrying out the following, that if s ∈
T and r ' s, then r ∈
T.
′
′
′
(a) Prove that if r ' s, then r ' s where r ≡ qr (mod n) and s ′ ≡ qs (mod n).
(b) Prove that if there exists an s ∈
T and an r ' s but r ∈
T , then there is an s ′′ ∈
T with
′′
′′
′′
′′
s ≤ δ − 1 and an r ' s such that r ∈
T.
(c) Prove that r ′′ and s ′′ from part (b) do not exist.
173
5.2 Reed–Solomon codes
When examining a family of codes, it is natural to ask if this family is asymptotically
“good” or “bad” in the following sense. We say that a family of codes is asymptotically
good provided that there exists an infinite subset of [n i , ki , di ] codes from this family
with limi→∞ n i = ∞ such that both lim infi→∞ ki /n i > 0 and lim infi→∞ di /n i > 0. For
example, codes that meet the Asymptotic Gilbert–Varshamov Bound are asymptotically
good. The family is asymptotically bad if no asymptotically good subfamily exists. Recall
from Section 2.10 that for an [n, k, d] code, the ratio k/n is called the rate of the code and
the ratio d/n is called the relative distance of the code. The former measures the number of
information coordinates relative to the total number of coordinates, and the latter measures
the error-correcting capability of the code. Ideally, we would like the rate and relative
distance to both be high, in order to be able to send a large number of messages while
correcting a large number of errors. But these are conflicting goals. So in a family of good
codes, we want an infinite subfamily where both the code rates and relative distances are
bounded away from 0; hence in this subfamily neither rate nor relative distance are low. In
general, the rates and relative distances for any class of codes is difficult or impossible to
determine. Unfortunately, primitive BCH codes are known to be bad [198].
Theorem 5.1.10 The family of primitive BCH codes over Fq is asymptotically bad.
Note that this negative result does not say that individual codes, particularly those of
modest length, are not excellent codes. BCH codes, or codes constructed from them, are
often the codes closest to optimal that are known [32].
As a corollary of Theorem 5.1.10, we see that the primitive narrow-sense BCH codes
are asymptotically bad. These codes also have extensions that are affine-invariant by Theorem 5.1.9. The fact that primitive narrow-sense BCH codes are asymptotically bad also, is
a corollary of the following result of Kasami [161].
Theorem 5.1.11 Any family of cyclic codes whose extensions are affine-invariant is asymptotically bad.
It is natural to ask whether or not there is any asymptotically good family of codes.
The answer is yes, as the Asymptotic Gilbert–Varshamov Bound implies. As we saw, the
proof of this bound is nonconstructive; it shows that a family of good codes exists but
does not give a construction of such a family. In Section 2.11 we saw that the lexicodes
meet the Asymptotic Gilbert–Varshamov Bound and hence are asymptotically good. We
will examine another family of codes in Chapter 13 that meets the Asymptotic Gilbert–
Varshamov Bound.
5.2
Reed–Solomon codes
In this section we will define Reed–Solomon codes as a subfamily of BCH codes. We will
also give another equivalent definition for the narrow-sense Reed–Solomon codes that will
allow us to generalize these important codes.
A Reed–Solomon code, abbreviated RS code, C over Fq is a BCH code of length n =
q − 1. Thus ordn (q) = 1 implying that all irreducible factors of x n − 1 are of degree 1 and all
174
BCH and Reed–Solomon codes
q-cyclotomic cosets modulo n have size 1. In fact, the roots of x n − 1 are exactly the nonzero
elements of Fq , and a primitive nth root of unity is a primitive element of Fq . So if C has
designed distance δ, the defining set of C has size δ − 1 and is T = {b, b + 1, . . . , b + δ − 2}
for some integer b. By Theorem 5.1.1 and the Singleton Bound, the dimension k and
minimum distance d of C satisfy k = n − δ + 1 ≥ n − d + 1 ≥ k. Thus both inequalities
are equalities implying d = δ and k = n − d + 1. In particular, C is MDS. We summarize
this information in the following theorem.
Theorem 5.2.1 Let C be an RS code over Fq of length n = q − 1 and designed distance δ.
Then:
(i) C has defining set T = {b, b + 1, . . . , b + δ − 2} for some integer b,
(ii) C has minimum distance d = δ and dimension k = n − d + 1, and
(iii) C is MDS.
Recall that in general the dual and cyclic complement of a BCH code are not BCH codes;
that is not the case with RS codes. Suppose that T is the defining set for an RS code C
of length n and designed distance δ. Then T is a set of δ − 1 consecutive elements from
N = {0, 1, . . . , n − 1}. By Theorem 4.4.6, the defining set of the cyclic complement C c
of C is N \ T , which is a set of n − δ + 1 consecutive elements implying that C c is RS.
Similarly, as (−1)T mod n is also a set of δ − 1 consecutive elements from N , we have
using Theorem 4.4.9 that the defining set N \ (−1)T mod n of C ⊥ is a consecutive set of
n − δ + 1 elements also. Therefore C ⊥ is an RS code.
Example 5.2.2 A primitive element of F13 is 2. Let C be the narrow-sense Reed–Solomon
code of designed distance 5 over F13 . It is a code of length 12 with defining set {1, 2, 3, 4}
and generator polynomial (x − 2)(x − 22 )(x − 23 )(x − 24 ) = 10 + 2x + 7x 2 + 9x 3 + x 4 .
By Theorem 5.2.1, C has minimum distance 5 and C is a [12, 8, 5] MDS code. C ⊥
is the [12, 4, 9] Reed–Solomon code with defining set {0, 1, 2, 3, 4, 5, 6, 7} and generator polynomial (x − 20 )(x − 21 )(x − 22 ) · · · (x − 27 ) = 3 + 12x + x 2 + 5x 3 + 11x 4 +
4x 5 + 10x 6 + 5x 7 + x 8 . The cyclic complement of C is the [12, 4, 9] Reed–Solomon
code with defining set {5, 6, 7, 8, 9, 10, 11, 0} and generator polynomial (x − 20 )(x − 25 )
(x − 26 ) · · · (x − 211 ) = 9 + 6x + 12x 2 + 10x 3 + 8x 4 + 6x 5 + 9x 6 + 4x 7 + x 8 .
We now present an alternative formulation of narrow-sense Reed–Solomon codes, which
is the original formulation of Reed and Solomon. This alternative formulation of narrowsense RS codes is of particular importance because it is the basis for the definitions
of generalized Reed–Solomon codes, Goppa codes, and algebraic geometry codes, as
we will see in Chapter 13. For k ≥ 0, let P k denote the set of polynomials of degree
less than k, including the zero polynomial, in Fq [x]. Note that P 0 is precisely the zero
polynomial.
Theorem 5.2.3 Let α be a primitive element of Fq and let k be an integer with 0 ≤ k ≤
n = q − 1. Then
C = {( f (1), f (α), f (α 2 ), . . . , f (α q−2 )) | f ∈ P k }
is the narrow-sense [n, k, n − k + 1] RS code over Fq .
175
5.3 Generalized Reed–Solomon codes
Proof: Clearly C is a linear code over Fq as P k is a linear subspace over Fq of Fq [x]. As P k
is k-dimensional, C will also be k-dimensional if we can show that if f and f 1 are distinct
elements of P k , then the corresponding elements in C are distinct. If the latter are equal,
then their difference is {0} implying that f − f 1 , which is a nonzero polynomial of degree
at most k − 1, has n ≥ k roots, contradicting Exercise 159. Thus C is k-dimensional.
Let D be the narrow-sense [n, k, n − k + 1] RS code over Fq . So D has defining set T =
{1, 2, . . . , n − k}. We show that C = D; it suffices to prove that C ⊆ D as both codes are kk−1
m
j
dimensional. Let c(x) = n−1
m=0 f m x ∈ P k
j=0 c j x ∈ C. Then there exists some f (x) =
j
such that c j = f (α ) for 0 ≤ j < n. To show that c(x) ∈ D we need to show that c(α i ) = 0
for i ∈ T by Theorem 4.4.2. If i ∈ T , then
n−1
n−1
c(α i ) =
j=0
c j αi j =
k−1
=
f m α jm α i j
j=0
m=0
k−1
n−1
fm
m=0
k−1
j=0
α (i+m) j =
fm
m=0
α (i+m)n − 1
.
α i+m − 1
But α (i+m)n = 1 and α i+m = 1 as 1 ≤ i + m ≤ n − 1 = q − 2 and α is a primitive nth root
of unity. Therefore c(α i ) = 0 for i ∈ T implying that C ⊆ D. Hence C = D.
Exercise 291 Let ev : P k → Fqn be given by
ev( f ) = ( f (1), f (α), f (α 2 ), . . . , f (α q−2 )),
where α is a primitive element of Fq and n = q − 1. Prove that the evaluation map ev is a
nonsingular linear transformation.
Note that the narrow-sense RS code defined in this alternate sense with k = 0 is precisely
the zero code.
This alternate formulation of narrow-sense RS codes gives an alternate encoding
scheme as well. Suppose that f 0 , f 1 , . . . , f k−1 are k information symbols and f (x) = f 0 +
f 1 x + · · · + f k−1 x k−1 , then
encode
( f 0 , f 1 , . . . , f k−1 ) −→ ( f (1), f (α), . . . , f (α q−2 )).
(5.2)
Notice that this encoding scheme is not systematic. There is a decoding scheme for RS
codes that is unusual in the sense that it finds the information symbols directly, under the
assumption that they have been encoded using (5.2). It is not syndrome decoding but is
an instance of a decoding scheme based on majority logic and was the original decoding
developed by Reed and Solomon [294]. Currently other schemes are used for decoding and
for that reason we do not present the original decoding here.
5.3
Generalized Reed–Solomon codes
The construction of the narrow-sense RS codes in Theorem 5.2.3 can be generalized to
(possibly noncyclic) codes. Let n be any integer with 1 ≤ n ≤ q. Choose γ = (γ0 , . . . ,γn−1 )
176
BCH and Reed–Solomon codes
to be an n-tuple of distinct elements of Fq , and v = (v0 , . . . , vn−1 ) to be an n-tuple of
nonzero, but not necessarily distinct, elements of Fq . Let k be an integer with 1 ≤ k ≤ n.
Then the codes
GRSk (γ, v) = {(v0 f (γ0 ), v1 f (γ1 ), . . . , vn−1 f (γn−1 )) | f ∈ P k }
are the generalized Reed–Solomon or GRS codes. Because no vi is 0, by repeating the proof
of Theorem 5.2.3, we see that GRSk (γ,v) is k-dimensional. Because a nonzero polynomial
f ∈ P k has at most k − 1 zeros, GRSk (γ, v) has minimum distance at least n − (k − 1) =
n − k + 1. By the Singleton Bound, it has minimum distance at most n − k + 1; hence,
GRSk (γ, v) has minimum distance exactly n − k + 1. Thus GRS codes are also MDS, as
were RS codes. It is obvious that if w is another n-tuple of nonzero elements of Fq , then
GRSk (γ, v) is monomially equivalent to GRSk (γ, w). The narrow-sense RS codes are GRS
codes with n = q − 1, γi = α i , where α is a primitive nth root of unity, and vi = 1 for
0 ≤ i ≤ n − 1. We summarize this information in the following theorem.
Theorem 5.3.1 With the notation as above:
(i) GRSk (γ, v) is an [n, k, n − k + 1] MDS code,
(ii) GRSk (γ, v) is monomially equivalent to GRSk (γ, w), and
(iii) narrow-sense RS codes are GRS codes with n = q − 1, γi = α i , and vi = 1 for 0 ≤
i ≤ n − 1.
Narrow-sense [q − 1, k, q − k] Reed–Solomon codes over Fq can be extended to MDS
codes as follows. Let C = {( f (1), f (α), f (α 2 ), . . . , f (α q−2 )) | f ∈ P k } be such a code.
Exercise 292 shows that if f ∈ P k , where k < q, then β∈Fq f (β) = 0. So
C = {( f (1), f (α), f (α 2 ), . . . , f (α q−2 ), f (0)) | f ∈ P k }
is the extension of C. Notice that this is also a GRS code with n = q, γi = α i for 0 ≤ i ≤
n − 2, γn−1 = 0, and vi = 1 for 0 ≤ i ≤ n − 1. Therefore
C is a [q, k, q − k + 1] MDS
code. In other words, when extending the narrow sense RS codes by adding an overall
parity check, the minimum weight increases, an assertion that can be guaranteed in general
for codes over arbitrary fields only if the minimum weight vectors are all odd-like. This
results in the following theorem.
Theorem 5.3.2 The [q, k, q − k + 1] extended narrow-sense Reed–Solomon code over Fq
is generalized Reed–Solomon and MDS.
Exercise 292 Prove that if f ∈ P k , where k < q, then β∈Fq f (β) = 0. See Exercise
164.
We now show that the dual of a GRS code is also GRS.
Theorem 5.3.3 Let γ = (γ0 , . . . , γn−1 ) be an n-tuple of distinct elements of Fq and let
v = (v0 , . . . , vn−1 ) be an n-tuple of nonzero elements of Fq . Then there exists an n-tuple
w = (w0 , . . . , wn−1 ) of nonzero elements of Fq such that GRSk (γ, v)⊥ = GRSn−k (γ, w)
for all k with 0 ≤ k ≤ n − 1. Furthermore, the vector w is any nonzero codeword in the
177
5.3 Generalized Reed–Solomon codes
1-dimensional code GRSn−1 (γ, v)⊥ and satisfies
n−1
i=0
wi vi h(γi ) = 0
(5.3)
for any polynomial h ∈ P n−1 .
Proof: Let C = GRSk (γ, v). We first consider the case k = n − 1. Since the dual of an
MDS code is also MDS by Theorem 2.4.3, C ⊥ is an [n, 1, n] code with a basis vector w = (w0 , w1 , . . . , wn−1 ) having no zero components. But the code GRS1 (γ, w) is
precisely all multiples of w, implying that C ⊥ = GRS1 (γ, w), verifying the result when
k = n − 1. This also shows that if h is any polynomial in P n−1 , then (5.3) holds because (v0 h(γ0 ), . . . , vn−1 h(γn−1 )) ∈ GRSn−1 (γ, v) = GRS1 (γ, w)⊥ . Now let 0 ≤ k ≤ n −
n−1
wi g(γi )vi f (γi ) =
1. When f ∈ P k and g ∈ P n−k , h = f g ∈ P n−1 . Thus, by (5.3), i=0
n−1
⊥
w
v
h(γ
)
=
0.
Therefore
GRS
(γ,
v)
⊆
GRS
(γ,
w).
Since
the dimension of
i
i
i
k
n−k
i=0
GRSk (γ, v)⊥ is n − k, the theorem follows.
A generator matrix of GRSk (γ, v) is
v0
v0 γ0
2
G = v0 γ0
v0 γ0k−1
v1
v1 γ1
v1 γ12
v1 γ1k−1
· · · vn−1
· · · vn−1 γn−1
2
· · · vn−1 γn−1
.
..
.
(5.4)
k−1
· · · vn−1 γn−1
By Theorem 5.3.3, a parity check matrix of GRSk (γ, v) is the generator matrix of
GRSn−k (γ, w), where w is given in Theorem 5.3.3. Therefore a parity check matrix for
GRSk (γ, v) is
w0
w0 γ0
2
H = w0 γ0
w0 γ0n−k−1
w1
w1 γ1
w1 γ12
· · · wn−1
· · · wn−1 γn−1
2
· · · wn−1 γn−1
..
.
w1 γ1n−k−1
···
n−k−1
wn−1 γn−1
.
Exercise 293 Prove that the matrix given in (5.4) is a generator matrix of
GRSk (γ, v).
We know from Theorem 5.3.1 that C = GRSk (γ, v) is MDS. We want to describe an
extension of C, denoted Č, that is also MDS. Let v be a nonzero element of Fq . The
generator matrix of Č is Ǧ = [G uT ], where u = (0, 0, . . . , 0, v). This extended code will
generally not be even-like. Choose w ∈ Fq so that
n−1
i=0
vi wi γin−1 + vw = 0.
178
BCH and Reed–Solomon codes
Such an element w exists as v = 0. Using (5.3) and the definition of w, we leave it to the
reader in Exercise 294 to verify that Č has parity check matrix
w1
· · · wn−1
0
w0
w0 γ0
w1 γ1
· · · wn−1 γn−1 0
2
2
w0 γ 2
w1 γ1
· · · wn−1 γn−1
0
Ȟ =
0
.
..
.
w0 γ0n−k
n−k
w1 γ1n−k · · · wn−1 γn−1
w
n−1
Notice that if w = 0, i=0 wi vi h(γi ) = 0 for all h ∈ P n , implying that v is a nonzero
vector in Fqn orthogonal to all of Fqn , which is a contradiction. So w = 0.
Exercise 294 Verify that Ȟ is a parity check matrix for Č.
We now verify that Č is MDS. Consider the (n − k + 1) × (n − k + 1) submatrix M of
Ȟ formed by any n − k + 1 columns of Ȟ . If the right-most column of Ȟ is not among the
n − k + 1 chosen, then M = V D, where V is a Vandermonde matrix and D is a diagonal
matrix. The entries of V are powers of n − k + 1 of the (distinct) γi s; the diagonal entries of
D are all chosen from {w0 , . . . , wn−1 }. As the γi s are distinct and the wi s are nonzero, the
determinants of V and D are both nonzero, using Lemma 4.5.1. Therefore M is nonsingular
in this case. Suppose the right-most column of Ȟ is among the n − k + 1 chosen. By
Theorem 2.4.3 any n − k columns of H are independent (and hence so are the corresponding
n − k columns of Ȟ ) as C is MDS. This implies that the right-most column of Ȟ must be
independent of any n − k other columns of Ȟ . So all of our chosen columns are independent.
Thus by Corollary 1.4.14, Č has minimum weight at least n − k + 2. By the Singleton
Bound, the minimum weight is at most n − k + 2 implying that Č is MDS.
In summary this discussion and Theorem 5.3.1(i) proves the following theorem.
Theorem 5.3.4 For 1 ≤ k ≤ n ≤ q, the GRS code GRSk (γ, v) is an MDS code, and it can
be extended to an MDS code of length n + 1.
Recall that a [q − 1, k, q − k] narrow-sense RS code over Fq can be extended by adding
an overall parity check; the resulting [q, k, q − k + 1] code is a GRS code which is MDS
by Theorem 5.3.2. This code itself can be extended to an MDS code by Theorem 5.3.4.
Thus a narrow-sense RS code of length q − 1 can be extended twice to an MDS code of
length q + 1.
So, in general, GRS codes C and their extensions Č are MDS. There are MDS codes that
are not equivalent to such codes. However, no MDS code with parameters other than those
arising from GRS codes or their extensions is presently known [298].
5.4
Decoding BCH codes
In this section we present three algorithms for nearest neighbor decoding of BCH codes.
The first method is known as Peterson–Gorenstein–Zierler decoding. It was originally
developed for binary BCH codes by Peterson [254] in 1960 and generalized shortly
179
5.4 Decoding BCH codes
thereafter by Gorenstein and Zierler to nonbinary BCH codes [109]. We will describe
this decoding method as a four step procedure. The second step of this procedure is
the most complicated and time consuming. The second method, known as Berlekamp–
Massey decoding, presents a more efficient alternate approach to carrying out step two
of the Peterson–Gorenstein–Zierler Algorithm. This decoding method was developed by
Berlekamp in 1967 [18]. Massey [224] recognized that Berlekamp’s method provided
a way to construct the shortest linear feedback shift-register capable of generating a
specified sequence of digits. The third decoding algorithm, discovered by Sugiyama,
Kasahara, Hirasawa, and Namekawa in 1975 [324], is also an alternate method to execute
the second step of the Peterson–Gorenstein–Zierler Algorithm. Known as the Sugiyama
Algorithm, it is a simple, yet powerful, application of the Euclidean Algorithm for
polynomials.
In this section we also present the main ideas in a list decoding algorithm which can
be applied to decoding generalized Reed–Solomon codes. This algorithm, known as the
Sudan–Guruswami Algorithm, will accomplish decoding beyond the packing radius, that
is, the bound obtained from the minimum distance of the code. When decoding beyond the
packing radius, one must expect more than one nearest codeword to the received vector
by Theorem 1.11.4. The Sudan–Guruswami Algorithm produces a complete list of all
codewords within a certain distance of the received vector. While we present this algorithm
applied to generalized Reed–Solomon codes, it can be used to decode BCH codes, Goppa
codes, and algebraic geometry codes with some modifications.
5.4.1
The Peterson–Gorenstein–Zierler Decoding Algorithm
Let C be a BCH code over Fq of length n and designed distance δ. As the minimum distance
of C is at least δ, C can correct at least t = ⌊(δ − 1)/2⌋ errors. The Peterson–Gorenstein–
Zierler Decoding Algorithm will correct up to t errors. While the algorithm will apply to
any BCH code, the proofs are simplified if we assume that C is narrow-sense. Therefore
the defining set T of C will be assumed to contain {1, 2, . . . , δ − 1}, with α the primitive
nth root of unity in the extension field Fq m of Fq , where m = ordn (q), used to determine
this defining set. The algorithm requires four steps, which we describe in order and later
summarize.
Suppose that y(x) is received, where we assume that y(x) differs from a codeword c(x)
in at most t coordinates. Therefore y(x) = c(x) + e(x) where c(x) ∈ C and e(x) is the error
vector which has weight ν ≤ t. Suppose that the errors occur in the unknown coordinates
k1 , k2 , . . . , kν . Therefore
e(x) = ek1 x k1 + ek2 x k2 + · · · + ekν x kν .
(5.5)
Once we determine e(x), which amounts to finding the error locations k j and the error magnitudes ek j , we can decode the received vector as c(x) = y(x) − e(x). Recall
by Theorem 4.4.2 that c(x) ∈ C if and only if c(α i ) = 0 for all i ∈ T . In particular
y(α i ) = c(α i ) + e(α i ) = e(α i ) for all 1 ≤ i ≤ 2t, since 2t ≤ δ − 1. For 1 ≤ i ≤ 2t we define the syndrome Si of y(x) to be the element Si = y(α i ) in Fq m . (Exercise 295 will explore the connection between this notion of syndrome and that developed in Section 1.11.)
180
BCH and Reed–Solomon codes
The first step in the algorithm is to compute the syndromes Si = y(α i ) for 1 ≤ i ≤ 2t from
the received vector. This process is aided by the following theorem proved in Exercise 296.
In the theorem we allow Si to be defined as y(α i ) even when i > 2t; these may not be
legitimate syndromes as c(α i ) may not be 0 in those cases.
q
Theorem 5.4.1 Siq = Si for all i ≥ 1.
Exercise 295 Let H
1 α α2
1 α2 α4
H =
1
α 2t
αt
be the t × n matrix
· · · α n−1
· · · α (n−1)2
.
..
.
(n−1)t
··· α
If y(x) = y0 + y1 x + · · · + yn−1 x n−1 , let y = (y0 , y1 , . . . , yn−1 ). Finally, let S =
(S1 , S2 , . . . , St ), where Si = y(α i ).
(a) Show that H yT = ST .
(b) Use Theorem 4.4.3 and part (a) to explain the connection between the notion of syndrome
given in this section and the notion of syndrome given in Section 1.11.
Exercise 296 Prove Theorem 5.4.1.
The syndromes lead to a system of equations involving the unknown error locations and
the unknown error magnitudes. Notice that from (5.5) the syndromes satisfy
ν
Si = y(α i ) =
j=1
ν
ek j (α i )k j =
ek j (α k j )i ,
(5.6)
j=1
for 1 ≤ i ≤ 2t. To simplify the notation, for 1 ≤ j ≤ ν, let E j = ek j denote the error
magnitude at coordinate k j and X j = α k j denote the error location number corresponding
to the error location k j . By Theorem 3.3.1, if α i = α k for i and k between 0 and n − 1,
then i = k. Thus knowing X j uniquely determines the error location k j . With this notation
(5.6) becomes
ν
Si =
j=1
E j X ij , for 1 ≤ i ≤ 2t,
(5.7)
which in turn leads to the system of equations:
S1 = E 1 X 1 + E 2 X 2 + · · · + E ν X ν ,
S2 = E 1 X 12 + E 2 X 22 + · · · + E ν X ν2 ,
S3 = E 1 X 13 + E 2 X 23 + · · · + E ν X ν3 ,
..
.
(5.8)
S2t = E 1 X 12t + E 2 X 22t + · · · + E ν X ν2t .
This system is nonlinear in the X j s with unknown coefficients E j . The strategy is to use
(5.7) to set up a linear system, involving new variables σ1 , σ2 , . . . , σν , that will lead directly
181
5.4 Decoding BCH codes
to the error location numbers. Once these are known, we return to the system (5.8), which
is then a linear system in the E j s and solve for the error magnitudes.
To this end, define the error locator polynomial to be
ν
σ (x) = (1 − x X 1 )(1 − x X 2 ) · · · (1 − x X ν ) = 1 +
σi x i .
i=1
The roots of σ (x) are the inverses of the error location numbers and thus
−2
−ν
σ X −1
= 1 + σ1 X −1
j + σ 2 X j + · · · + σν X j = 0
j
produces
for 1 ≤ j ≤ ν. Multiplying by E j X i+ν
j
E j X i+ν
+ · · · + σν E j X ij = 0
+ σ1 E j X i+ν−1
j
j
for any i. Summing this over j for 1 ≤ j ≤ ν yields
ν
j=1
ν
ν
+ σ1
E j X i+ν
j
j=1
E j X i+ν−1
+ · · · + σν
j
j=1
E j X ij = 0.
(5.9)
As long as 1 ≤ i and i + ν ≤ 2t, these summations are the syndromes obtained in (5.7).
Because ν ≤ t, (5.9) becomes
Si+ν + σ1 Si+ν−1 + σ2 Si+ν−2 + · · · + σν Si = 0
or
σ1 Si+ν−1 + σ2 Si+ν−2 + · · · + σν Si = −Si+ν
(5.10)
valid for 1 ≤ i ≤ ν. Thus we can find the σk s if we solve the matrix equation
σν
S3
· · · Sν−1 Sν
S1 S2
−Sν+1
S2 S3
S4
· · · Sν
Sν+1
σν−1 −Sν+2
S3 S4
S5
· · · Sν+1 Sν+2 σν−2 = −Sν+3
. .
..
.
.
.
.
.
(5.11)
Sν
Sν+1
Sν+2
···
S2ν−2
S2ν−1
σ1
−S2ν
that arises from (5.10). The second step of our algorithm is to solve (5.11) for σ1 , . . . , σν .
Once this second step has been completed, σ (x) has been determined. However, determining σ (x) is complicated by the fact that we do not know ν, and hence we do not know
the size of the system involved. We are searching for the solution which has the smallest
value of ν, and this is aided by the following lemma.
Lemma 5.4.2 Let µ ≤ t and let
S1 S2
· · · Sµ
S2 S3
· · · Sµ+1
Mµ =
..
.
Sµ
Sµ+1
···
S2µ−1
.
Then Mµ is nonsingular if µ = ν and singular if µ > ν, where ν is the number of errors
that have occurred.
182
BCH and Reed–Solomon codes
Proof: If µ > ν, let X ν+1 = X ν+2 = · · · = X µ = 0 and E ν+1 = E ν+2 = · · · =
Exercise 297 shows that if Aµ and Bµ are given by
1
1
···
1
E1 X 1
0
···
0
X1
0
X
·
·
·
X
X
·
·
·
0
E
2
µ
2
2
Aµ =
and Bµ =
..
..
.
.
µ−1
X1
µ−1
X2
···
X µµ−1
0
0
···
Eµ X µ
E µ = 0.
,
then Mµ = Aµ Bµ ATµ . Therefore det Mµ = det Aµ det Bµ det Aµ . If µ > ν, det Bµ = 0 as
Bµ is a diagonal matrix with 0 on the diagonal. If µ = ν, then det Bµ = 0 as Bµ is a
diagonal matrix with only nonzero entries on the diagonal. Also det Aµ = 0 by Lemma 4.5.1
because Aµ is a Vandermonde matrix with X 1 , . . . , X µ distinct. Hence Mµ is nonsingular if
µ = ν.
Exercise 297 Do the following, where the notation is given in Lemma 5.4.2:
µ
(a) Show that if µ > ν, Si = j=1 E j X ij for 1 ≤ i ≤ 2t.
(b) Show that Mµ = Aµ Bµ ATµ .
To execute the second step of our algorithm, we attempt to guess the number ν of
errors. Call our guess µ and begin with µ = t, which is the largest that ν could be. The
coefficient matrix of the linear system (5.11) that we are attempting to solve is Mµ = Mt in
Lemma 5.4.2. If Mµ is singular, we reduce our guess µ to µ = t − 1 and decide whether or
not Mµ = Mt−1 is singular. As long as we obtain a singular matrix, we continue to reduce
our guess µ of the number of errors by one until some Mµ is nonsingular. With ν = µ,
solve (5.11) and thereby determine σ (x).
The third step is then to find the roots of σ (x) and invert them to determine the error
location numbers. This is usually done by exhaustive search checking σ (α i ) for 0 ≤ i < n.
The fourth step is to plug these numbers into (5.8) and solve this linear system for the
error magnitudes E j . In fact we only need to consider the first ν equations in (5.8) for the
following reason. The coefficient matrix of the first ν equations has determinant
1
1
···
1
X1 X2 · · · Xν
X1
X 2 X 2 · · · X ν2
X2
···
Xν
2
1
X
·
·
·
X
det
=
X
det
.
1
2
ν
..
..
.
.
X 1ν
X 2ν
···
X νν
X 1ν−1
X 2ν−1
···
X νν−1
The right-hand side determinant is the determinant of a Vandermonde matrix; the latter is
nonzero as the X j s are distinct.
The Peterson–Gorenstein–Zierler Decoding Algorithm for BCH codes is therefore the
following:
I. Compute the syndromes Si = y(α i ) for 1 ≤ i ≤ 2t.
II. In the order µ = t, µ = t − 1, . . . decide if Mµ is singular, stopping at the first value
of µ where Mµ is nonsingular. Set ν = µ and solve (5.11) to determine σ (x).
III. Find the roots of σ (x) by computing σ (α i ) for 0 ≤ i < n. Invert the roots to get the
error location numbers X j .
IV. Solve the first ν equations of (5.8) to obtain the error magnitudes E j .
183
5.4 Decoding BCH codes
We now discuss why this algorithm actually works. We are assuming that a codeword
has been transmitted and a vector received that differs from the transmitted codeword in
ν ≤ t coordinates. Thus there is only one correct set of error location numbers and one
correct set of error magnitudes. These lead to a unique error locator polynomial. Step II
must determine correctly the value of ν since, by Lemma 5.4.2, ν is the largest value
less than or equal to t such that Mν is nonsingular. Once we know the number of errors,
we solve (5.11) to obtain a possible solution for the unknown coefficients of the error
locator polynomial. Because the matrix of the linear system used is nonsingular and our
correct set of coefficients of the error locator polynomial must also be a solution, these
must agree. Thus Step II correctly determines the error locator polynomial and hence
Step III correctly determines the error location numbers. Once those are computed, the
first ν equations in (5.8) have a unique solution for the error magnitudes that Step IV
computes. Because the correct set of error magnitudes also is a solution, it must be the one
computed.
What happens if a received vector is more than distance t from every codeword? In that
case just about anything could happen. For example, the supposed error locator polynomial
σ (x) found in Step II may fail in Step III to have deg σ (x) distinct roots that are all nth roots
of unity. For instance, the roots might be repeated or they might lie in an extension field of
Fq but not be nth roots of unity. If this were to occur at Step III, the decoder should declare
that more than t errors have been made. Another problem could occur in Step IV. Suppose
an error locator polynomial has been found whose roots are all distinct nth roots of unity
and the number of these roots agrees with the degree of the error locator polynomial. Step
IV fails if the error magnitudes do not lie in Fq . This is certainly possible since the entries
in the coefficient matrix and the syndromes in (5.8) generally lie in an extension field of
Fq . Again were this to occur, the decoder should declare that more than t errors have been
made.
We make several remarks about this algorithm before presenting some examples.
r After the errors are found and the received vector is corrected, the resulting vector should
be checked to make sure it is in the code. (This can be accomplished, for example, either
by dividing the corrected vector c(x) by the generator polynomial to verify that the
generator polynomial is a factor of the corrected vector, or by computing c(α i ) and
verifying that these values are 0 for all i in the defining set.) If it is not, and all steps have
been performed correctly, more than t errors occurred.
r If the BCH code is binary, all error magnitudes must be 1. Hence Step IV can be skipped,
provided the corrected vector is verified to be in the code, as just remarked.
r If all the syndromes are 0 in Step I, the received vector is in fact a codeword and the
received vector should be considered to be the transmitted vector.
r As with all nearest neighbor decoders, the decoder will make a decoding error if the
received vector is more than distance t from the transmitted codeword but less than or
equal to distance t from some other codeword. The decoder will give the latter codeword
as the nearest one, precisely as it is designed to do.
r If the BCH code is not narrow-sense, the algorithm still works as presented.
r In addition to the number of errors, ν is the length of the shortest linear feedback shiftregister capable of generating the sequence S1 , S2 , . . . ; see (5.10).
184
BCH and Reed–Solomon codes
Table 5.1 F16 with primitive element α, where
α4 = 1 + α
0000
0001
0010
0100
0
1
α
α2
α3
α4
α5
α6
1000
0011
0110
1100
1011
0101
1010
0111
α7
α8
α9
α 10
1110
1111
1101
1001
α 11
α 12
α 13
α 14
Example 5.4.3 Let C be the [15, 7] narrow-sense binary BCH code of designed distance
δ = 5, which has defining set T = {1, 2, 3, 4, 6, 8, 9, 12}. Using the primitive 15th root of
unity α from Table 5.1, the generator polynomial of C is
g(x) = 1 + x 4 + x 6 + x 7 + x 8 .
Suppose that C is used to transmit a codeword and y(x) = 1 + x + x 5 + x 6 + x 9 + x 10 is
received. Using Table 5.1 and Theorem 5.4.1, Step I produces
S1 = 1 + α + α 5 + α 6 + α 9 + α 10 = α 2 ,
S2 = S12 = α 4 ,
S3 = 1 + α 3 + α 15 + α 18 + α 27 + α 30 = α 11 ,
S4 = S22 = α 8 .
For Step II, we note that
S1
S2
M2 =
S2
α2
= 4
S3
α
α4
α 11
is nonsingular with inverse
α8
α
M2−1 =
α
.
α 14
Thus ν = 2 errors have been made, and we must solve
S1
S2
S2
S3
σ2
σ1
=
−S3
−S4
or
α2
α4
α4
α 11
σ2
σ1
=
α 11
.
α8
The solution is [σ2 σ1 ]T = M2−1 [α 11 α 8 ]T = [α 14 α 2 ]T . Thus Step II produces the error
locator polynomial σ (x) = 1 + α 2 x + α 14 x 2 . Step III yields the roots α 11 and α 5 of σ (x)
and hence the error location numbers X 1 = α 4 and X 2 = α 10 . As the code is binary, we
skip Step IV. So the error vector is e(x) = x 4 + x 10 , and the transmitted codeword is c(x) =
1 + x + x 4 + x 5 + x 6 + x 9 , which is (1 + x)g(x).
Example 5.4.4 Let C be the code of Example 5.4.3. Suppose that y(x) = 1 + x 2 + x 8 is
received. Then Step I produces S1 = S2 = S4 = 0 and S3 = α 10 . In Step II, the matrix
M2 =
S1
S2
S2
S3
=
0
0
0
α 10
is singular, as is M1 = [S1 ] = [0]. Since the syndromes are not all 0 and we cannot complete
the algorithm, we must conclude that more than two errors were made.
185
5.4 Decoding BCH codes
Example 5.4.5 Let C be the binary [15, 5] narrow-sense BCH code of designed distance
δ = 7, which has defining set T = {1, 2, 3, 4, 5, 6, 8, 9, 10, 12}. Using the primitive 15th
root of unity α in Table 5.1, the generator polynomial of C is
g(x) = 1 + x + x 2 + x 4 + x 5 + x 8 + x 10 .
Suppose that using C, y(x) = x + x 4 + x 5 + x 7 + x 9 + x 12 is received. Step I produces
S1 = α 14 , S2 = α 13 , S3 = α 14 , S4 = α 11 , S5 = 1, and S6 = α 13 . The matrix M3 is singular,
and we have
M2 =
S1
S2
S2
S3
=
α 14
α 13
α 13
α 14
and M2−1 =
α 10
α9
α9
.
α 10
Then [σ2 σ1 ]T = M2−1 [α 14 α 11 ]T = [α 6 α 14 ]T . Thus Step II produces the error locator polynomial σ (x) = 1 + α 14 x + α 6 x 2 . Step III yields the roots α 5 and α 4 of σ (x) and hence the
error location numbers X 1 = α 10 and X 2 = α 11 . Skipping Step IV, the error vector is
e(x) = x 10 + x 11 , and the transmitted codeword is c(x) = x + x 4 + x 5 + x 7 + x 9 + x 10 +
x 11 + x 12 , which is (x + x 2 )g(x).
Example 5.4.6 Let C be the [15, 9] narrow-sense RS code over F16 of designed distance
δ = 7, which has defining set T = {1, 2, 3, 4, 5, 6}. Using the primitive 15th root of unity
α in Table 5.1, the generator polynomial of C is
g(x) = (α + x)(α 2 + x) · · · (α 6 + x)
= α 6 + α 9 x + α 6 x 2 + α 4 x 3 + α 14 x 4 + α 10 x 5 + x 6 .
Suppose that a codeword of C is received as
y(x) = α 7 + α 10 x 2 + x 3 + α 2 x 4 + α 5 x 5 + α 4 x 6 + α 4 x 7 + α 7 x 11 .
Step I produces S1 = α 5 , S2 = α 7 , S3 = α 10 , S4
M3 is nonsingular and we have to solve
5
α
−S4
S1 S2 S3
σ3
S2 S3 S4 σ2 = −S5 or α 7
α 10
−S6
σ1
S3 S4 S5
= α 5 , S5 = α 7 , and S6 = α 3 . The matrix
α7
α 10
α5
5
α
σ3
α 10
5
σ2 = α 7 .
α
7
α3
σ1
α
The solution is σ1 = α 5 , σ2 = α 6 , and σ3 = α 4 . Thus Step II produces the error locator
polynomial σ (x) = 1 + α 5 x + α 6 x 2 + α 4 x 3 . Step III yields the roots α 13 , α 9 , and α 4 of
σ (x) and hence the error location numbers X 1 = α 2 , X 2 = α 6 , and X 3 = α 11 . For Step IV,
solve the first three equations of (5.8) or
α 5 = E 1 α 2 + E 2 α 6 + E 3 α 11 ,
α 7 = E 1 α 4 + E 2 α 12 + E 3 α 7 ,
α 10 = E 1 α 6 + E 2 α 3 + E 3 α 3 .
The solution is E 1 = 1, E 2 = α 3 , and E 3 = α 7 . Thus the error vector is e(x) = x 2 + α 3 x 6 +
α 7 x 11 , and the transmitted codeword is c(x) = α 7 + α 5 x 2 + x 3 + α 2 x 4 + α 5 x 5 + α 7 x 6 +
α 4 x 7 , which is (α + α 4 x)g(x).
186
BCH and Reed–Solomon codes
Exercise 298 Verify the calculations in Examples 5.4.3, 5.4.4, 5.4.5, and 5.4.6.
Exercise 299 The following vectors were received using the BCH code C of Example 5.4.3.
Correct these received vectors:
(a) y(x) = 1 + x + x 4 + x 5 + x 6 + x 7 + x 10 + x 11 + x 13 ,
(b) y(x) = x + x 4 + x 7 + x 8 + x 11 + x 12 + x 13 ,
(c) y(x) = 1 + x + x 5 .
Exercise 300 The following vectors were received using the BCH code C of Example 5.4.5.
Correct these received vectors:
(a) y(x) = 1 + x 5 + x 6 + x 7 + x 8 + x 12 + x 13 ,
(b) y(x) = 1 + x + x 2 + x 4 + x 7 + x 8 + x 9 + x 13 ,
(c) y(x) = 1 + x + x 2 + x 6 + x 9 + x 10 + x 12 + x 14 .
Exercise 301 The following vectors were received using the RS code C of Example 5.4.6.
Correct these received vectors:
(a) y(x) = α 3 x 3 + α 9 x 4 + α 14 x 5 + α 13 x 6 + α 6 x 7 + α 14 x 8 + α 4 x 9 + α 12 x 10 + α 2 x 11 ,
(b) y(x) = α 6 + α 9 x + α 4 x 3 + α 14 x 4 + α 6 x 6 + α 10 x 7 + α 8 x 8 + α 3 x 9 + α 14 x 10 +
α 4 x 11 .
How can this algorithm be improved? As stated, this algorithm is quite efficient if the
error-correcting capability of the code is rather small. It is not unreasonable to work, even
by hand, with 3 × 3 matrices over finite fields. With computer algebra packages, larger
size matrices can be handled. But when the size of these matrices becomes quite large (i.e.
when the error-correcting capability of the code is very large), Step II becomes very time
consuming. The Berlekamp–Massey Algorithm introduced in the next subsection uses an
iterative approach to compute the error locator polynomial in a more efficient manner when
t is large. There is another method due to Sugiyama, Kasahava, Hirasawa, and Namekawa
[324] that uses the Euclidean Algorithm to find the error locator polynomial; this algorithm
is quite comparable in efficiency with the Berlekamp–Massey Algorithm and is described
in Section 5.4.3. Step III can also be quite time consuming if the code is long. Little seems
to have been done to improve this step although there is a circuit design using Chien search
that is often used; see [18] for a description. Step IV can be accomplished using a technique
due to Forney [86]; see [21].
5.4.2
The Berlekamp–Massey Decoding Algorithm
The Berlekamp–Massey Decoding Algorithm is a modification of the second step of
Peterson–Gorenstein–Zierler decoding. The verification that it works is quite technical
and is omitted; readers interested should consult [18, 21, 22]. Although the algorithm applies to BCH codes, it is simplified if the codes are binary, and we will present only that
case.
We will adopt the same notation as in the previous subsection. In Step II of the Peterson–
Gorenstein–Zierler Algorithm, the error locator polynomial is computed by solving a system
of ν linear equations in ν unknowns, where ν is the number of errors made. If ν is large,
187
5.4 Decoding BCH codes
this step is time consuming. For binary codes, the Berlekamp–Massey Algorithm builds the
error locator polynomial by requiring that its coefficients satisfy a set of equations called
the Newton identities rather than (5.10). These identities hold over general fields provided
all error magnitudes are 1, which is precisely the case when the field is F2 . The equations
(5.10) are sometimes called generalized Newton identities. The Newton identities are:
S1 + σ1 = 0,
S2 + σ1 S1 + 2σ2 = 0,
S3 + σ1 S2 + σ2 S1 + 3σ3 = 0,
..
.
Sν + σ1 Sν−1 + · · · + σν−1 S1 + νσν = 0,
and for j > ν:
S j + σ1 S j−1 + · · · + σν S j−ν = 0.
A proof that these identities hold is found in [210] or [50, Theorem 3.3]. It turns out that
we only need to look at the first, third, fifth, . . . of these. For convenience we number these
Newton identities (noting that iσi = σi when i is odd):
(1 ) S1 + σ1 = 0,
(2 ) S3 + σ1 S2 + σ2 S1 + σ3 = 0,
(3 ) S5 + σ1 S4 + σ2 S3 + σ3 S2 + σ4 S1 + σ5 = 0,
..
.
(µ) S2µ−1 + σ1 S2µ−2 + σ2 S2µ−3 + · · · + σ2µ−2 S1 + σ2µ−1 = 0,
..
.
Define a sequence of polynomials σ (µ) (x) of degree dµ indexed by µ as follows:
(µ)
(µ)
(µ)
σ (µ) (x) = 1 + σ1 x + σ2 x 2 + · · · + σdµ x dµ .
The polynomial σ (µ) (x) is calculated to be the minimum degree polynomial whose coeffi(µ)
(µ)
(µ)
cients σ1 , σ2 , σ3 , . . . satisfy all of the first µ numbered Newton identities. Associated
to each polynomial is its discrepancy µ , which measures how far σ (µ) (x) is from satisfying
the µ + 1st identity:
(µ)
(µ)
(µ)
(µ)
µ = S2µ+1 + σ1 S2µ + σ2 S2µ−1 + · · · + σ2µ S1 + σ2µ+1 .
We start with two initial polynomials, σ (−1/2) (x) = 1 and σ (0) (x) = 1, and then generate
σ (µ) (x) inductively in a manner that depends on the discrepancy. The discrepancy −1/2 = 1
by convention; the remaining discrepancies are calculated. Plugging the coefficients of
σ (0) (x) into identity (1), we obtain S1 (as σ1(0) = 0) and so the discrepancy of σ (0) (x) is
0 = S1 .
We proceed with the first few polynomials to illustrate roughly the ideas involved. Noting
the discrepancy 0 = S1 , if σ (0) (x) had an additional term S1 x, the coefficients of this polynomial σ (0) (x) + S1 x = 1 + S1 x would satisfy identity (1) since S1 + S1 = 0. So σ (1) (x) =
1 + S1 x. Plugging the coefficients of σ (1) (x) into (2), we have 1 = S3 + σ1(1) S2 =
S3 + S1 S2 . If 1 = 0, then σ (1) (x) satisfies (2) also. If 1 = 0 and if S1 = 0, then setting
188
BCH and Reed–Solomon codes
2
σ (2) (x) = σ (1) (x) + (S3 + S1 S2 )S1−1 x 2 = σ (1) (x) + 1 −1
0 x , we see that this polynomial
(1)
satisfies (1) and (2). If 1 = 0 but S1 = 0, then σ (x) = 1, 1 = S3 , and the lowest degree
polynomial that will satisfy (1) and (2) is σ (2) (x) = σ (1) (x) + S3 x 3 = σ (1) (x) + 1 x 3 . The
choices get more complicated as the process continues but, remarkably, the Berlekamp–
Massey Algorithm reduces each stage down to one of two choices.
The Berlekamp–Massey Algorithm for computing an error locator polynomial for binary
BCH codes is the following iterative algorithm that begins with µ = 0 and terminates when
σ (t) (x) is computed:
I. If µ = 0, then
σ (µ+1) (x) = σ (µ) (x).
II. If µ = 0, find a value −(1/2) ≤ ρ < µ such that ρ = 0 and 2ρ − dρ is as large as
possible. Then
σ (µ+1) (x) = σ (µ) (x) + µ ρ−1 x 2(µ−ρ) σ (ρ) (x).
The error locator polynomial is σ (x) = σ (t) (x); if this polynomial has degree greater than
t, more than t errors have been made, and the decoder should declare the received vector is
uncorrectable. Once the error locator polynomial is determined, one of course proceeds as in
the Peterson–Gorenstein–Zierler Algorithm to complete the decoding. We now reexamine
Examples 5.4.3, 5.4.4, and 5.4.5 using the Berlekamp–Massey Algorithm. It is helpful in
keeping track of the computations to fill out the following table.
µ
σ (µ) (x)
µ
dµ
2µ − dµ
−1/2
0
1
..
.
1
1
1
S1
0
0
−1
0
t
Example 5.4.7 We recompute σ (x) from Example 5.4.3, using Table 5.1. In that example
t = 2, and the syndromes are S1 = α 2 , S2 = α 4 , S3 = α 11 , and S4 = α 8 . We obtain the table
µ
σ (µ) (x)
µ
dµ
2µ − dµ
−1/2
0
1
2
1
1
1 + α2 x
1 + α 2 x + α 14 x 2
1
α2
α
0
0
1
−1
0
1
We explain how this table was obtained. Observe that 0 = S1 = α 2 and so II is used in
computing σ (1) (x). We must choose ρ < 0 and the only choice is ρ = −1/2. So
2(0+1/2) (−1/2)
σ (1) (x) = σ (0) (x) + 0 −1
σ
(x) = 1 + α 2 x.
−1/2 x
After computing 1 = S3 + σ1(1) S2 + σ2(1) S1 + σ3(1) = α 11 + α 2 α 4 = α, we again use II
to find σ (2) (x). We must find ρ < 1 with ρ = 0 and 2ρ − dρ as large as possible.
189
5.4 Decoding BCH codes
Thus ρ = 0 and
2(1−0) (0)
σ (x) = 1 + α 2 x + α 14 x 2 .
σ (2) (x) = σ (1) (x) + 1 −1
0 x
As t = 2, σ (x) = σ (2) (x) = 1 + α 2 x + α 14 x 2 , which agrees with the result of Example
5.4.3.
Example 5.4.8 We recompute σ (x) from Example 5.4.4 where t = 2, S1 = S2 = S4 = 0,
and S3 = α 10 . We obtain the table
µ
σ (µ) (x)
−1/2
1
0
1
1
1
2
1 + α 10 x 3
µ
dµ
2µ − dµ
1
0
α 10
0
0
0
−1
0
2
Since 0 = S1 = 0, I is used to compute σ (1) (x), yielding σ (1) (x) = σ (0) (x) = 1. Then
1 = S3 + σ1(0) S2 + σ2(0) S1 + σ3(0) = α 10 . So we use II to find σ (2) (x). We must find ρ < 1
with ρ = 0 and 2ρ − dρ as large as possible. Thus ρ = −1/2 and
2(1+1/2) (−1/2)
σ
(x) = 1 + α 10 x 3 .
σ (2) (x) = σ (1) (x) + 1 −1
−1/2 x
So σ (x) = σ (2) (x) = 1 + α 10 x 3 , which has degree greater than t = 2; hence the received
vector is uncorrectable, which agrees with Example 5.4.4.
Example 5.4.9 We recompute σ (x) from Example 5.4.5 where t = 3 and the syndromes
are S1 = α 14 , S2 = α 13 , S3 = α 14 , S4 = α 11 , S5 = 1, and S6 = α 13 . We obtain
µ
σ (µ) (x)
−1/2
1
0
1
1
1 + α 14 x
2
1 + α 14 x + α 6 x 2
3
1 + α 14 x + α 6 x 2
µ
dµ
2µ − dµ
1
α 14
α5
0
0
0
1
2
−1
0
1
2
As 0 = S1 = α 14 , II is used to compute σ (1) (x). As ρ < 0, ρ = −1/2 yielding
2(0+1/2) (−1/2)
σ
(x) = 1 + α 14 x.
σ (1) (x) = σ (0) (x) + 0 −1
−1/2 x
Since 1 = S3 + σ1(1) S2 + σ2(1) S1 + σ3(1) = α 14 + α 14 α 13 = α 5 , we again use II to find
σ (2) (x). We choose ρ < 1 with ρ = 0 and 2ρ − dρ as large as possible. Thus ρ = 0 and
2(1−0) (0)
σ (x) = 1 + α 14 x + α 6 x 2 .
σ (2) (x) = σ (1) (x) + 1 −1
0 x
Now 2 = S5 + σ1(2) S4 + σ2(2) S3 + σ3(2) S2 + σ4(2) S1 + σ5(2) = 1 + α 14 α 11 + α 6 α 14 = 0. So
to compute σ (3) (x), use I to obtain σ (3) (x) = σ (2) (x); thus σ (x) = 1 + α 14 x + α 6 x 2 , agreeing with Example 5.4.5.
Exercise 302 Recompute the error locator polynomials from Exercise 299 using the
Berlekamp–Massey Algorithm.
190
BCH and Reed–Solomon codes
··· ✛
+❣✛
+❣✛
✻
✻
✗✔
✗✔
✗✔
−σn
−σ1
−σ2
✖✕
✖✕
✖✕
✻
✻
✻
✲
S j−1
Sj
S j−2
S j−n
···
✲
S j−n−1 , S j−n−2 , . . . , S1
Figure 5.1 Linear feedback shift-register.
Exercise 303 Recompute the error locator polynomials from Exercise 300 using the
Berlekamp–Massey Algorithm.
As stated earlier, this decoding algorithm for BCH codes over arbitrary fields was first
developed by Berlekamp in the first edition of [18]. Shortly after, Massey [224] showed
that this decoding algorithm actually gives the shortest length recurrence relation which
generates the (finite or infinite) sequence S1 , S2 , . . . whether or not this sequence comes
from syndromes. This is the same as the minimum length n of a linear feedback shift-register
that generates the entire sequence when S1 , . . . , Sn is the initial contents of the shift-register.
In this context, the algorithm produces a sequence of polynomials σ (i) (x) associated with
a shift-register which generates S1 , . . . , Si . The discrepancy i of σ (i) (x) measures how
close the shift-register also comes to generating Si+1 . If the discrepancy is 0, then the shiftregister also generates Si+1 . Otherwise, the degree of the polynomial must be increased
with a new longer associated shift-register. In the end, the algorithm produces a polynomial
n
σ (x) = 1 + i=1
σi x i , called the connection polynomial, leading to the shift-register of
Figure 5.1.
5.4.3
The Sugiyama Decoding Algorithm
The Sugiyama Algorithm is another method to find the error locator polynomial, and thus
presents another alternative to complete Step II of the Peterson–Gorenstein–Zierler Algorithm. This algorithm, developed in [324], applies to a class of codes called Goppa codes that
include BCH codes as a subclass. This algorithm is a relatively simple, but clever, application
of the Euclidean Algorithm. We will only study the algorithm as applied to BCH codes.
+
Recall that the error locator polynomial σ (x) is defined as νj=1 (1 − x X j ). The error
evaluator polynomial ω(x) is defined as
ν
ω(x) =
EjXj
j=1
ν
(1 − x X i ) =
i=1
i= j
ν
EjXj
j=1
σ (x)
.
1 − xXj
(5.12)
Note that deg(σ (x)) = ν and deg(ω(x)) ≤ ν − 1. We define the polynomial S(x) of degree
at most 2t − 1 by
2t−1
S(x) =
Si+1 x i ,
i=0
where Si for 1 ≤ i ≤ 2t are the syndromes of the received vector.
191
5.4 Decoding BCH codes
Expanding the right-hand side of (5.12) in a formal power series and using (5.7) together
with the definition of S(x), we obtain
ν
ω(x) = σ (x)
= σ (x)
EjXj
j=1
∞
i=0
ν
1
= σ (x)
1 − xXj
E j X i+1
j
j=1
∞
ν
EjXj
j=1
2t−1
i
x ≡ σ (x)
i=0
(x X j )i
i=0
ν
E j X i+1
j
j=1
x i (mod x 2t )
2t
≡ σ (x)S(x) (mod x ).
Therefore we have what is termed the key equation
ω(x) ≡ σ (x)S(x) (mod x 2t ).
Exercise 304 You may wonder if using power series to derive the key equation is legitimate.
Give a non-power series derivation. Hint:
σ (x)
ν
1 − x 2t X 2tj
1
= σ (x)
+ x 2t
(1 − x X i ).
1 − xXj
1 − xXj
i=1
i= j
The following observation about σ (x) and ω(x) will be important later.
Lemma 5.4.10 The polynomials σ (x) and ω(x) are relatively prime.
Proof: The roots of σ (x) are precisely X −1
j for 1 ≤ j ≤ ν. But
ν
ω X −1
1 − X −1
= EjXj
j X i = 0,
j
i=1
i= j
proving the lemma.
The Sugiyama Algorithm uses the Euclidean Algorithm to solve the key equation. The
Sugiyama Algorithm is as follows.
I. Suppose that f (x) = x 2t and s(x) = S(x). Set r−1 (x) = f (x), r0 (x) = s(x), b−1 (x) =
0, and b0 (x) = 1.
II. Repeat the following two computations finding h i (x), ri (x), and bi (x) inductively for
i = 1, 2, . . . , I, until I satisfies deg r I −1 (x) ≥ t and deg r I (x) < t:
ri−2 (x) = ri−1 (x)h i (x) + ri (x),
where deg ri (x) < deg ri−1 (x),
bi (x) = bi−2 (x) − h i (x)bi−1 (x).
III. σ (x) is some nonzero scalar multiple of b I (x).
Note that I from Step II is well-defined as deg ri (x) is a strictly decreasing sequence
with deg r−1 (x) > t. In order to prove that the Sugiyama Algorithm works, we need the
following lemma.
Lemma 5.4.11 Using the notation of the Sugiyama Algorithm, let a−1 (x) = 1, a0 (x) = 0,
and ai (x) = ai−2 (x) − h i (x)ai−1 (x) for i ≥ 1. The following hold.
192
BCH and Reed–Solomon codes
(i)
(ii)
(iii)
(iv)
ai (x) f (x) + bi (x)s(x) = ri (x) for i ≥ −1.
bi (x)ri−1 (x) − bi−1 (x)ri (x) = (−1)i f (x) for i ≥ 0.
ai (x)bi−1 (x) − ai−1 (x)bi (x) = (−1)i+1 for i ≥ 0.
deg bi (x) + deg ri−1 (x) = deg f (x) for i ≥ 0.
Proof: All of these are proved by induction. For (i), the cases i = −1 and i = 0 follow
directly from the initial values set in Step I of the Sugiyama Algorithm and the values
a−1 (x) = 1 and a0 (x) = 0. Assuming (i) holds with i replaced by i − 1 and i − 2, we have
ai (x) f (x) + bi (x)s(x) = [ai−2 (x) − h i (x)ai−1 (x)] f (x)
+ [bi−2 (x) − h i (x)bi−1 (x)]s(x)
= ai−2 (x) f (x) + bi−2 (x)s(x)
− h i (x)[ai−1 (x) f (x) + bi−1 (x)s(x)]
= ri−2 (x) − h i (x)ri−1 (x) = ri (x),
completing (i).
Again when i = 0, (ii) follows from Step I of the Sugiyama Algorithm. Assume (ii) holds
with i replaced by i − 1. Then
bi (x)ri−1 (x) − bi−1 (x)ri (x) = [bi−2 (x) − h i (x)bi−1 (x)]ri−1 (x) − bi−1 (x)ri (x)
= bi−2 (x)ri−1 (x) − bi−1 (x)[h i (x)ri−1 (x) + ri (x)]
= bi−2 (x)ri−1 (x) − bi−1 (x)ri−2 (x)
= −(−1)i−1 f (x) = (−1)i f (x),
verifying (ii).
When i = 0, (iii) follows from Step I of the Sugiyama Algorithm, a−1 (x) = 1, and
a0 (x) = 0. Assume (iii) holds with i replaced by i − 1. Then
ai (x)bi−1 (x) − ai−1 (x)bi (x) = [ai−2 (x) − h i (x)ai−1 (x)]bi−1 (x)
− ai−1 (x)[bi−2 (x) − h i (x)bi−1 (x)]
= −[ai−1 (x)bi−2 (x) − ai−2 (x)bi−1 (x)]
= −(−1)i = (−1)i+1 ,
proving (iii).
When i = 0, (iv) follows again from Step I of the Sugiyama Algorithm. Assume (iv)
holds with i replaced by i − 1, that is, deg bi−1 (x) + deg ri−2 (x) = deg f (x). In Step
II of the Sugiyama Algorithm, we have deg ri (x) < deg ri−2 (x). So deg(bi−1 (x)ri (x)) =
deg bi−1 (x) + deg ri (x) < deg f (x) implying (iv) for case i using part (ii).
We now verify that the Sugiyama Algorithm works. By Lemma 5.4.11(i) we have
a I (x)x 2t + b I (x)S(x) = r I (x).
(5.13)
From the key equation, we also know that
a(x)x 2t + σ (x)S(x) = ω(x)
(5.14)
193
5.4 Decoding BCH codes
for some polynomial a(x). Multiply (5.13) by σ (x) and (5.14) by b I (x) to obtain
a I (x)σ (x)x 2t + b I (x)σ (x)S(x) = r I (x)σ (x)
and
(5.15)
2t
a(x)b I (x)x + σ (x)b I (x)S(x) = ω(x)b I (x).
(5.16)
Modulo x 2t these imply that
r I (x)σ (x) ≡ ω(x)b I (x) (mod x 2t ).
(5.17)
As deg σ (x) ≤ t, by the choice of I , deg(r I (x)σ (x)) = deg r I (x) + deg σ (x) < t + t = 2t.
By Lemma 5.4.11(iv), the choice of I , and the fact that deg ω(x) < t, deg(ω(x) ×
b I (x)) = deg ω(x) + deg b I (x) < t + deg b I (x) = t + (deg x 2t − deg r I −1 (x)) ≤ 3t − t =
2t. Therefore (5.17) implies that r I (x)σ (x) = ω(x)b I (x). This, together with (5.15) and
(5.16), shows that
a I (x)σ (x) = a(x)b I (x).
(5.18)
However, Lemma 5.4.11(iii) implies that a I (x) and b I (x) are relatively prime and hence
a(x) = λ(x)a I (x) by (5.18). Substituting this into (5.18) produces
σ (x) = λ(x)b I (x).
(5.19)
Plugging these into (5.14) we obtain λ(x)a I (x)x 2t + λ(x)b I (x)S(x) = ω(x). Thus (5.13)
implies that
ω(x) = λ(x)r I (x).
(5.20)
By Lemma 5.4.10, (5.19), and (5.20), λ(x) must be a nonzero constant, verifying Step III
of the Sugiyama Algorithm.
Since we are only interested in the roots of σ (x), it suffices to find the roots of b I (x)
produced in Step II; this gives the desired error location numbers.
Example 5.4.12 We obtain a scalar multiple of σ (x) from Example 5.4.3, using the
Sugiyama Algorithm and Table 5.1. In that example t = 2, and the syndromes are S1 = α 2 ,
S2 = α 4 , S3 = α 11 , and S4 = α 8 . The following table summarizes the results.
i
ri (x)
−1
x4
8 3
11 2
0 α x + α x + α4 x + α2
1
αx 2 + α 4 x + α 12
2
α2
h i (x)
α 7 x + α 10
α7 x
bi (x)
0
1
α 7 x + α 10
α 14 x 2 + α 2 x + 1
The first index I where deg r I (x) < t = 2 is I = 2. Hence σ (x) is a multiple of b2 (x) =
α 14 x 2 + α 2 + 1; in fact from Example 5.4.3, b2 (x) = σ (x).
194
BCH and Reed–Solomon codes
Example 5.4.13 Using the Sugiyama Algorithm we examine Example 5.4.4 where t = 2,
S1 = S2 = S4 = 0, and S3 = α 10 . The following table summarizes the computations.
i
ri (x)
h i (x) bi (x)
−1
0
1
x4
α 10 x 2
0
0
1
α5 x 2
α5 x 2
The first index I where deg r I (x) < t = 2 is I = 1. But in this case b1 (x) = α 5 x 2 , which
has 0 for its roots indicating that more than two errors were made, in agreement with
Example 5.4.4. Note also that r1 (x) = 0 implies by (5.20) that ω(x) = 0, which is obviously
impossible as σ (x) and ω(x) are supposed to be relatively prime by Lemma 5.4.10.
Example 5.4.14 We obtain a scalar multiple of σ (x) from Example 5.4.5 using the
Sugiyama Algorithm. Here t = 3 and S1 = α 14 , S2 = α 13 , S3 = α 14 , S4 = α 11 , S5 = 1,
and S6 = α 13 . The following table summarizes the results.
i
ri (x)
−1
x6
13 5
4
11 3
0 α x + x + α x + α 14 x 2 + α 13 x + α 14
1
α 11 x 4 + α 4 x 3 + α 14 x 2 + α 5 x + α 3
2
α 12
h i (x)
bi (x)
α2 x + α4
α2 x + α2
0
1
α2 x + α4
α 4 x 2 + α 12 x + α 13
The first index I where deg r I (x) < t = 3 is I = 2. Hence σ (x) is a multiple of b2 (x) =
α 4 x 2 + α 12 x + α 13 . From Example 5.4.5, σ (x) = α 2 b2 (x).
Exercise 305 Verify the calculations in Examples 5.4.12, 5.4.13, and 5.4.14.
Exercise 306 Using the Sugiyama Algorithm, find a scalar multiple of the error locator
polynomial from Example 5.4.6.
Exercise 307 Using the Sugiyama Algorithm, find scalar multiples of the error locator
polynomials from Exercise 299.
Exercise 308 Using the Sugiyama Algorithm, find scalar multiples of the error locator
polynomials from Exercise 300.
Exercise 309 Using the Sugiyama Algorithm, find scalar multiples of the error locator
polynomials from Exercise 301.
We remark that the Sugiyama Algorithm applies with other choices for f (x) and s(x),
with an appropriate modification of the condition under which the algorithm stops in Step
II. Such a modification works for decoding Goppa codes; see [232].
It is worth noting that the Peterson–Gorenstein–Zierler, the Berlekamp–Massey, or the
Sugiyama Algorithm can be used to decode any cyclic code up to the BCH Bound. Let C
be a cyclic code with defining set T and suppose that T contains δ consecutive elements
{b, b + 1, . . . , b + δ − 2}. Let B be the BCH code with defining set Cb ∪ Cb+1 ∪ · · · ∪
Cb+δ−2 , which is a subset of T . By Exercise 239, C ⊆ B. Let t = ⌊(δ − 1)/2⌋. Suppose that
195
5.4 Decoding BCH codes
a codeword c(x) ∈ C is transmitted and y(x) is received where t or fewer errors have been
made. Then c(x) ∈ B and any of the decoding algorithms applied to B will correct y(x)
and produce c(x). Thus these algorithms will correct a received word in any cyclic code
provided that if ν errors are made, 2ν + 1 does not exceed the BCH Bound of the code. Of
course this number of errors may be less than the actual number of errors that C is capable
of correcting.
5.4.4
The Sudan–Guruswami Decoding Algorithm
In a 1997 paper Madhu Sudan [323] developed a procedure for decoding [n, k, d] Reed–
Solomon codes that is capable of correcting some e errors where e > ⌊(d − 1)/2⌋. This
method was extended by Guruswami and Sudan [113] to remove certain restrictions in the
original Sudan Algorithm. To be able to correct e errors where e > ⌊(d − 1)/2⌋, the algorithm produces a list of all possible codewords within Hamming distance e of any received
vector; such an algorithm is called a list-decoding algorithm. The Sudan–Guruswami Algorithm applies to generalized Reed–Solomon codes as well as certain BCH and algebraic
geometry codes. In this section we present this algorithm for generalized Reed–Solomon
codes and refer the interested reader to [113] for the other codes. The Sudan–Guruswami
Algorithm has itself been generalized by Kötter and Vardy [179] to apply to soft decision
decoding.
To prepare for the algorithm we need some preliminary notation involving polynomials in two variables. Suppose x and y are independent indeterminates and p(x, y) =
i j
i
j pi, j x y is a polynomial in Fq [x, y], the ring of all polynomials in the two variables
x and y. Let wx and w y be nonnegative real numbers. The (wx , w y )-weighted degree of
p(x, y) is defined to be
max{wx i + w y j | pi, j = 0}.
Notice that the (1, 1)-weighted degree of p(x, y) is merely the degree of p(x, y). For
positive integers s and δ, let Ns (δ) denote the number of monomials x i y j whose (1, s)weighted degree is δ or less. We say that the point (α, β) ∈ Fq2 lies on or is a root of p(x, y)
provided p(α, β) = 0. We will need the multiplicity of this root. To motivate the definition of
multiplicity, recall that if f (x) ∈ Fq [x] and α is a root of f (x), then its multiplicity as a root
of f (x) is the number m where f (x) = (x − α)m g(x) for some g(x) ∈ Fq [x] with g(α) =
0. When working with two variables we cannot generalize this notion directly. Notice,
however, that f (x + α) = x m h(x), where h(x) = g(x + α); also h(0) = 0. In particular
f (x + α) contains a monomial of degree m but none of smaller degree. This concept can
be generalized. The root (α, β) of the polynomial p(x, y) has multiplicity m provided the
shifted polynomial p(x + α, y + β) contains a monomial of degree m but no monomial of
lower degree.
Exercise 310 Let p(x, y) = 1 + x + y − x 2 − y 2 − 2x 2 y + x y 2 − y 3 + x 4 − 2x 3 y −
x 2 y 2 + 2x y 3 ∈ F5 [x]. Show that (1, 2) ∈ F25 is a root of p(x, y) with multiplicity 3.
196
BCH and Reed–Solomon codes
Recall that an [n, k] generalized Reed–Solomon code over Fq is defined by
GRSk (γ, v) = {(v0 f (γ0 ), v1 f (γ1 ), . . . , vn−1 f (γn−1 )) | f ∈ P k },
where γ = (γ0 , γ1 , . . . , γn−1 ) is an n-tuple of distinct elements of Fq , v = (v0 , v1 , . . . , vn−1 )
is an n-tuple of nonzero elements of Fq , and P k is the set of polynomials in Fq [x] of degree
k − 1 or less including the zero polynomial. Suppose that c = c0 c1 · · · cn−1 ∈ GRSk (γ, v)
′
= c + e is received. Then there is a unique f ∈ P k such that
is sent and y′ = y0′ y1′ · · · yn−1
ci = vi f (γi ) for 0 ≤ i ≤ n − 1. We can find c if we can determine the polynomial f . Let
A = {(γ0 , y0 ), (γ1 , y1 ), . . . , (γn−1 , yn−1 )} where yi = yi′ /vi . Suppose for a moment that
no errors occurred in the transmission of c. Then yi = ci /vi = f (γi ) for 0 ≤ i ≤ n − 1.
In particular, all points of A lie on the polynomial p(x, y) = y − f (x). Now suppose
that errors occur. Varying slightly our terminology from earlier, define an error locator
polynomial (x, y) to be any polynomial in Fq [x, y] such that (γi , yi ) = 0 for all i such
that yi = ci /vi . Since yi − f (γi ) = 0 if yi = ci /vi , all points of A lie on the polynomial
p(x, y) = (x, y)(y − f (x)). The basic idea of the Sudan–Guruswami Algorithm is to find
a polynomial p(x, y) ∈ Fq [x, y] where each element of A is a root with a certain multiplicity and then find the factors of that polynomial of the form y − f (x). Further restrictions
on p(x, y) are imposed to guarantee the error-correcting capability of the algorithm.
The Sudan–Guruswami Decoding Algorithm for the [n, k, n − k + 1] code GRSk (γ, v)
is:
I. Fix a positive integer m. Pick δ to be the smallest positive integer to satisfy
nm(m + 1)
< Nk−1 (δ).
2
(5.21)
Recall that Nk−1 (δ) is the number of monomials x i y j whose (1, k − 1)-weighted degree
is δ or less. Set
δ
t=
+ 1.
m
II. Construct a nonzero polynomial p(x, y) ∈ Fq [x, y] such that each element of A is a
root of p(x, y) of multiplicity at least m and p(x, y) has (1, k − 1)-weighted degree at
most δ.
III. Find all factors of p(x, y) of the form y − f (x) where f (x) ∈ P k and f (γi ) = yi for
at least t γi s. For each such f produce the corresponding codeword in GRSk (γ, v).
We must verify that this algorithm works and give a bound on the number of errors that
it will correct. The following three lemmas are needed.
Lemma 5.4.15 Let (α, β) ∈ Fq2 be a root of p(x, y) ∈ Fq [x, y] of multiplicity m or more.
If f (x) is a polynomial in Fq [x] such that f (α) = β, then g(x) = p(x, f (x)) ∈ Fq [x] is
divisible by (x − α)m .
Proof: Let f 1 (x) = f (x + α) − β. By our hypothesis, f 1 (0) = 0, and so f 1 (x) = x f 2 (x)
for some polynomial f 2 (x) ∈ Fq [x]. Define g1 (x) = p(x + α, f 1 (x) + β). Since (α, β) is a
root of p(x, y) of multiplicity m or more, p(x + α, y + β) has no monomial of degree less
than m. Setting y = f 1 (x) = x f 2 (x) in p(x + α, y + β) shows that g1 (x) is divisible by x m ,
197
5.4 Decoding BCH codes
which implies that g1 (x − α) is divisible by (x − α)m . However,
g1 (x − α) = p(x, f 1 (x − α) + β) = p(x, f (x)) = g(x),
showing that g(x) is divisible by (x − α)m .
Lemma 5.4.16 Fix positive integers m, t, and δ such that mt > δ. Let p(x, y) ∈ Fq [x, y] be
a polynomial such that (γi , yi ) is a root of p(x, y) of multiplicity at least m for 0 ≤ i ≤ n − 1.
Furthermore, assume that p(x, y) has (1, k − 1)-weighted degree at most δ. Let f (x) ∈ P k
where yi = f (γi ) for at least t values of i with 0 ≤ i ≤ n − 1. Then y − f (x) divides
p(x, y).
Proof: Let g(x) = p(x, f (x)). As p(x, y) has (1, k − 1)-weighted degree at most δ, g(x)
is either the zero polynomial or a nonzero polynomial of degree at most δ. Assume g(x)
is nonzero. Let S = {i | 0 ≤ i ≤ n − 1, f (γi ) = yi }. By Lemma 5.4.15, (x − γi )m divides
+
g(x) for i ∈ S. As the γi s are distinct, h(x) = i∈S (x − γi )m divides g(x). Since h(x) has
degree at least mt > δ and δ is the maximum degree of g(x), we have a contradiction if
g(x) is nonzero. Thus g(x) is the zero polynomial, which implies that y = f (x) is a root of
p(x, y) viewed as a polynomial in y over the field of rational functions in x. By the Division
Algorithm, y − f (x) is a factor of this polynomial.
Lemma 5.4.17 Let p(x, y) = j ℓ p j,ℓ x j y ℓ ∈ Fq [x, y]. Suppose that (α, β) ∈ Fq2 and
′ a b
that p ′ (x, y) = a b pa,b
x y = p(x + α, y + β). Then
j ℓ j−a ℓ−b
′
=
pa,b
α β p j,ℓ .
a b
j≥a ℓ≥b
Proof: We have
p ′ (x, y) =
j
ℓ
p j,ℓ (x + α) j (y + β)ℓ
j a j−a
x α
a
a=0
j
=
p j,ℓ
j
ℓ
ℓ b ℓ−b
y β .
b
b=0
ℓ
′
Clearly, the coefficient pa,b
of x a y b is as claimed.
We are now in a position to verify the Sudan–Guruswami Algorithm and give the error
bound for which the algorithm is valid.
Theorem 5.4.18 The Sudan–Guruswami Decoding Algorithm applied to the [n, k, n − k +
1] code GRSk (γ, v) will produce all codewords within Hamming distance e or less of a
received vector where e = n − ⌊δ/m⌋ − 1.
Proof: We first must verify that the polynomial p(x, y) from Step II actually exists. For
p(x, y) to exist, p(x + γi , y + yi ) must have no terms of degree less than m for 0 ≤ i ≤
n − 1. By Lemma 5.4.17, this is accomplished if for each i with 0 ≤ i ≤ n − 1,
j ℓ j−a ℓ−b
γ
yi p j,ℓ = 0 for all a ≥ 0, b ≥ 0 with a + b < m.
(5.22)
a b i
j≥a ℓ≥b
198
BCH and Reed–Solomon codes
For each i, there are (m(m + 1))/2 equations in (5.22) since the set {(a, b) ∈ Z2 | a ≥
0, b ≥ 0, a + b < m} has size (m(m + 1))/2; hence there are a total of (nm(m + 1))/2
homogeneous linear equations in the unknown coefficients p j,ℓ . Since we wish to produce
a nontrivial polynomial of (1, k − 1)-weighted degree at most δ, there are a total of Nk−1 (δ)
unknown coefficients p j,ℓ in this system of (nm(m + 1))/2 homogeneous linear equations.
As there are fewer equations than unknowns by (5.21), a nontrivial solution exists and Step II
can be completed. By our choice of t in Step I, mt > δ. If f (x) ∈ P k is a polynomial with
f (γi ) = yi for at least t values of i, by Lemma 5.4.16, y − f (x) is a factor of p(x, y). Thus
Step III of the algorithm will produce all codewords at distance e = n − t = n − ⌊δ/m⌋ − 1
or less from the received vector.
As Step I requires computation of Nk−1 (δ), the next lemma proves useful.
Lemma 5.4.19 Let s and δ be positive integers. Then
δ
δ(δ + 2)
s δ
.
+1 ≥
Ns (δ) = δ + 1 −
2 s
s
2s
Proof: By definition,
Ns (δ) =
⌊ δs ⌋ δ−is
i=0 j=0
1=
⌊ δs ⌋
i=0
(δ + 1 − is)
δ
+1 −
s
δ
+1
δ+1−
≥
s
= (δ + 1)
The result follows.
δ
δ
+1
s
s
δ
δ δ+2
.
≥ ·
2
s
2
s
2
Example 5.4.20 Let C be a [15, 6, 10] Reed–Solomon code over F16 . The Peterson–
Gorenstein–Zierler Decoding Algorithm can correct up to four errors. If we choose m = 2
in the Sudan–Guruswami Decoding Algorithm, then (nm(m + 1))/2 = 45 and the smallest
value of δ for which 45 < N5 (δ) is δ = 18, in which case N5 (18) = 46 by Lemma 5.4.19.
Then t = ⌊δ/m⌋ + 1 = 10 and by Theorem 5.4.18, the Sudan–Guruswami Algorithm can
correct 15 − 10 = 5 errors.
Exercise 311 Let C be the code of Example 5.4.20. Choose m = 6 in the Sudan–Guruswami
Algorithm. Show that the smallest value of δ for which (nm(m + 1))/2 = 315 < N5 (δ) is
δ = 53. Verify that the Sudan–Guruswami Algorithm can correct six errors with these
parameters.
Exercise 312 Let C be a [31, 8, 24] Reed–Solomon code over F32 . The Peterson–
Gorenstein–Zierler Algorithm can correct up to 11 errors.
(a) Choose m = 1 in the Sudan–Guruswami Algorithm. Find the smallest value of δ for
which (nm(m + 1))/2 = 31 < N7 (δ). Using m = 1, how many errors can the Sudan–
Guruswami Algorithm correct?
199
5.4 Decoding BCH codes
(b) Choose m = 2 in the Sudan–Guruswami Algorithm. Find the smallest value of δ for
which (nm(m + 1))/2 = 93 < N7 (δ). Using m = 2, how many errors can the Sudan–
Guruswami Algorithm correct?
(c) Choose m = 3 in the Sudan–Guruswami Algorithm. Find the smallest value of δ for
which (nm(m + 1))/2 = 186 < N7 (δ). Using m = 3, how many errors can the Sudan–
Guruswami Algorithm correct?
As can be seen in Example 5.4.20 and Exercises 311 and 312, the error-correcting
capability of the Sudan–Guruswami Decoding Algorithm can grow if m is increased. The
tradeoff for higher error-correcting capability is an increase in the (1, k − 1)-weighted
degree of p(x, y), which of course increases the complexity of the algorithm. The following
corollary gives an idea of how the error-correcting capability varies with m.
Corollary 5.4.21 The Sudan–Guruswami Decoding Algorithm applied to the [n, k, n −
k + 1] code GRSk (γ, v) will produce all codewords within Hamming distance e or less of
√
a received vector where e ≥ n − 1 − n R(m + 1)/m and R = k/n.
Proof: As δ is chosen to be the smallest positive integer such that (5.21) holds,
Nk−1 (δ − 1) ≤
nm(m + 1)
.
2
By Lemma 5.4.19, Nk−1 (δ − 1) ≥ (δ − 1)(δ + 1)/(2(k − 1)). Hence
nm(m + 1)
δ2 − 1
≤
.
2(k − 1)
2
If δ 2 < k, then δ 2 /(2k) < 1/2 ≤ nm(m + 1)/2. If δ 2 ≥ k, then δ 2 /(2k) ≤ (δ 2 − 1)/
(2(k − 1)) by Exercise 313. In either case,
nm(m + 1)
δ2
≤
,
2k
2
implying
2
δ
k m+1
≤n
·
,
m
n
m
which produces the desired result from Theorem 5.4.18.
Exercise 313 Do the following:
(a) Show that if δ 2 ≥ k, then δ 2 /(2k) ≤ (δ 2 − 1)/(2(k − 1)).
√
(b) Show that if δ 2 /(2k) ≤ nm(m + 1)/2, then δ/m ≤ n (k/n)((m + 1)/m).
In this corollary, R = k/n is the information rate. If m is large, we see that the fraction
e/n √
of errors that the Sudan–Guruswami Decoding Algorithm can correct is approximately
1 − R. The fraction of errors that the Peterson–Gorenstein–Zierler Decoding Algorithm
can correct is approximately (1 − R)/2. Exercise 314 explores the relationship between
these two functions.
200
BCH and Reed–Solomon codes
Exercise 314 Do the following:
(a) Verify that the fraction of errors that the Peterson–Gorenstein–Zierler Decoding Algorithm can correct in a GRS code is√approximately (1 − R)/2.
(b) Plot the two functions y = 1 − R and y = (1 − R)/2 for 0 ≤ R ≤ 1 on the same
graph. What do these graphs show about the comparative error-correcting capability of the Peterson–Gorenstein–Zierler and the Sudan–Guruswami Decoding
Algorithms?
To carry out the Sudan–Guruswami Decoding Algorithm we must have a method to
compute the polynomial p(x, y) of Step II and then find the factors of p(x, y) of the
form y − f (x) in Step III. (Finding p(x, y) can certainly be accomplished by solving the
nm(m + 1)/2 equations from (5.22), but as the values in Exercise 312 indicate, the number of equations and unknowns gets rather large rather quickly.) A variety of methods
have been introduced to carry out Steps II and III. We will not examine these methods
here, but the interested reader can consult [11, 178, 245, 248, 300, 354].
5.5
Burst errors, concatenated codes, and interleaving
Reed–Solomon codes, used in concatenated form, are very useful in correcting burst errors.
As the term implies, a burst error occurs when several consecutive components of a codeword may be in error; such a burst often extends over several consecutive codewords which
are received in sequence.
Before giving the actual details, we illustrate the process. Suppose that C is an [n, k]
binary code being used to transmit information. Each message from Fk2 is encoded to a
codeword from Fn2 using C. The message is transmitted then as a sequence of n binary
digits. In reality, several codewords are sent one after the other, which then appear to
the receiver as a very long string of binary digits. Along the way these digits may have
been changed. A random individual symbol may have been distorted so that one cannot
recognize it as either 0 or 1, in which case the received symbol is considered erased. Or
a random symbol could be changed into another symbol and the received symbol is in
error. As we discussed in Section 1.11, more erasures than errors can be corrected because
error locations are unknown, whereas erasure locations are known. Sometimes several
consecutive symbols, a burst, may have been erased or are in error. The receiver then breaks
up the string into codewords of length n and decodes each string, if possible. However, the
presence of burst errors can make decoding problematic as the codes we have developed
are designed to correct random errors. However, we can modify our codes to also handle
bursts. An example where the use of coding has made a significant impact is in compact disc
recording; a scratch across the disc can destroy several consecutive bits of information. The
ability to correct burst errors has changed the entire audio industry. We will take this up in the
next section.
Burst errors are often handled using concatenated codes, which are sometimes then
interleaved. Concatenated codes were introduced by Forney in [87]. We give a simple
version of concatenated codes; the more general theory can be found, for example, in [75].
201
5.5 Burst errors, concatenated codes, and interleaving
Let A be an [n, k, d] code over Fq . Let Q = q k and define ψ : F Q → A to be a one-to-one
Fq -linear map; that is ψ(x + y) = ψ(x) + ψ(y) for all x and y in F Q , and ψ(αx) = αψ(x)
for all x ∈ F Q and α ∈ Fq , noting that F Q is an extension field of Fq . Let B be an [N , K , D]
code over F Q . The concatenation of A and B is the code
C = {ψ(b1 , b2 , . . . , b N ) | (b1 , b2 , . . . , b N ) ∈ B},
where ψ(b1 , b2 , . . . , b N ) = (ψ(b1 ), ψ(b2 ), . . . , ψ(b N )). C is called a concatenated code
with inner code A and outer code B. In effect, a codeword in C is obtained by taking a
codeword in B and replacing each component by a codeword of A determined by the image
of that component under ψ. The code C is then a code of length n N over Fq . The following
theorem gives further information about C.
Theorem 5.5.1 Let A, B, and C be as above. Then C is a linear [n N , k K ] code over Fq
whose minimum distance is at least d D.
Exercise 315 Prove Theorem 5.5.1.
Example 5.5.2 Let B be the [6, 3, 4] hexacode over F4 with generator matrix
1 0 0 1 ω ω
G = 0 1 0 ω 1 ω.
0 0 1 ω ω 1
Let A be the [2, 2, 1] binary code F22 and define ψ : F4 → A by the following:
ψ(0) = 00, ψ(1) = 10, ψ(ω) = 01, ψ(ω) = 11.
This is F2 -linear, as can be easily verified. The concatenated code C with inner code A and
outer code B has generator matrix
1 0 0 0 0 0 1 0 0 1 0 1
0 1 0 0 0 0 0 1 1 1 1 1
0 0 1 0 0 0 0 1 1 0 0 1
.
0 0 0 1 0 0 1 1 0 1 1 1
0 0 0 0 1 0 0 1 0 1 1 0
0 0 0 0 0 1 1 1 1 1 0 1
Rows two, four, and six of this matrix are obtained after multiplying the rows of G by ω.
C is a [12, 6, 4] code.
Exercise 316 Let B be the hexacode given in Example 5.5.2, and let A be the [3, 2, 2]
even binary code. Define ψ : F4 → A by the following:
ψ(0) = 000, ψ(1) = 101, ψ(ω) = 011, ψ(ω) = 110.
(a) Verify that ψ is F2 -linear.
(b) Give a generator matrix for the concatenated code C with inner code A and outer code
B.
(c) Show that C is an [18, 6, 8] code.
202
BCH and Reed–Solomon codes
We now briefly discuss how codes, such as Reed–Solomon codes, can be used to correct
burst errors. Let C be an [n, k] code over Fq . A b-burst is a vector in Fqn whose nonzero
coordinates are confined to b consecutive positions, the first and last of which are nonzero.
The code C is b-burst error-correcting provided there do not exist distinct codewords c1 and
c2 , and a b′ -burst u1 and a b′′ -burst u2 with b′ ≤ b and b′′ ≤ b such that c1 + u1 = c2 + u2 .
If C is a linear, b-burst error-correcting code then no b′ -burst is a codeword for any b′ with
1 ≤ b′ ≤ 2b.
Now let Q = 2m , and let B be an [N , K , D] code over F Q . (A good choice for B is
a Reed–Solomon code or a shortened Reed–Solomon code as such a code will be MDS,
hence maximizing D given N and K . See Exercise 317.) Let A be the [m, m, 1] binary
code Fm
2 . Choosing a basis e1 , e2 , . . . , em of F Q = F2m over F2 , we define ψ : F Q → A
by ψ(a1 e1 + · · · + am em ) = a1 · · · am . The map ψ is one-to-one and F2 -linear; see Exercise 319. If we refer to elements of F2m as bytes and elements of F2 as bits, each component
byte of a codeword in B is replaced by the associated vector from Fm
2 of m bits to form the
corresponding codeword in C. The concatenated code C with inner code A and outer code
B is an [n, k, d] binary code with n = m N , k = m K , and d ≥ D. This process is illustrated
by the concatenated code constructed in Example 5.5.2. There the basis of F4 is e1 = 1
and e2 = ω, and the bytes (elements of F4 ) each correspond to two bits determined by the
map ψ.
Exercise 317 Let C be an [n, k, n − k + 1] MDS code over Fq . Let C 1 be the code obtained
from C by shortening on some coordinate. Show that C 1 is an [n − 1, k − 1, n − k + 1]
code; that is, show that C 1 is also MDS.
Exercise 318 In compact disc recording two shortened Reed–Solomon codes over F256
are used. Beginning with a Reed–Solomon code of length 255, explain how to obtain
[32, 28, 5] and [28, 24, 5] shortened Reed–Solomon codes.
Exercise 319 Let A be the [m, m, 1] binary code Fm
2 . Let e1 , e2 , . . . , em be a basis of
F2m over F2 . Define ψ : F2m → A by ψ(a1 e1 + · · · + am em ) = a1 · · · am . Prove that ψ is
one-to-one and F2 -linear.
Let C be the binary concatenated [m N , mk] code, as above, with inner code A and
outer code B. Let c = ψ(b) be a codeword in C where b is a codeword in B. Let u be a
N
−1
to map
b-burst in Fm
2 . We can break the burst into N strings of m bits each and use ψ
the burst into a vector of N bytes. More formally, let u = u1 · · · u N , where ui ∈ Fm
2 for
N
′
−1
−1
1 ≤ i ≤ N . Map u into the vector u = ψ (u1 ) · · · ψ (u N ) in F Q . As u is a b-burst, then
the number of nonzero bytes of u′ is, roughly speaking, at most b/m. For instance, if u is
a (3m + 1)-burst, then at most four of the bytes of u′ can be nonzero; see Exercise 320. So
burst error-correction is accomplished as follows. Break the received codeword into N bit
strings of length m, apply ψ −1 to each bit string to produce a vector in F NQ , and correct that
vector using B. More on this is left to Exercise 320.
Exercise 320 Let A = Fm
2 be the [m, m, 1] binary code of length m and let B be an
[N , K , D] code over F2m . Let ψ be a one-to-one F2 -linear map from F2m onto Fm
2 . Let C
be the concatenated code with inner code A and outer code B.
203
5.6 Coding for the compact disc
N
(a) Let u be a b-burst of length m N in Fm
associated to u′ , a vector in F NQ where Q = 2m .
2
′
Let b ≤ am + 1. Show that wt(u ) ≤ a + 1.
(b) Show that C corrects bursts of length b ≤ am + 1, where a = ⌊(D − 1)/2⌋ − 1.
(c) Let m = 5 and let B be a [31, 7, 25] Reed–Solomon code over F32 . What is the maximum
length burst that the [156, 35] binary concatenated code C can correct?
There is another technique, called interleaving, that will improve the burst error-correcting
capability of a code. Let C be an [n, k] code over Fq that can correct a burst of length b.
Define I (C, t) to be a set of vectors in Fqnt constructed as follows. For any set of t codewords
c1 , . . . , ct from C, with ci = ci1 ci2 · · · cin , form the matrix
c11 c12 · · · c1n
c21 c22 · · · c2n
M =
..
.
ct1
ct2
···
ctn
whose rows are the codewords c1 , . . . , ct . The codewords of I (C, t) are the vectors
c11 c21 · · · ct1 c12 c22 · · · ct2 · · · c1n c2n · · · ctn
of length nt obtained from M by reading down consecutive columns. The code I (C, t)
is C interleaved to depth t.
Theorem 5.5.3 If C is an [n, k] code over Fq that can correct any burst of length b, then
I (C, t) is an [nt, kt] code over Fq that can correct any burst of length bt.
Exercise 321 Prove Theorem 5.5.3.
Example 5.5.4 The [7, 4, 3] binary Hamming code H3 can correct only bursts of length
1. However, interleaving H3 to depth 4 produces a [28, 16] binary code I (H3 , 4) that can
correct bursts of length 4. Note, however, that the minimum distance of I (H3 , 4) is 3 and
hence this code can correct single errors, but not all double errors. It can correct up to four
errors as long as they are confined to four consecutive components. See Exercise 322.
Exercise 322 Prove that the minimum distance of I (H3 , 4) is 3.
5.6
Coding for the compact disc
In this section we give an overview of the encoding and decoding used for the compact disc
(CD) recorder. The compact disc digital audio system standard currently in use was developed by N. V. Philips of The Netherlands and Sony Corporation of Japan in an agreement
signed in 1979. Readers interested in further information on coding for CDs should consult
[47, 99, 119, 133, 154, 202, 253, 286, 335, 341].
A compact disc is an aluminized disc, 120 mm in diameter, which is coated with a
clear plastic coating. On each disc is one spiral track, approximately 5 km in length (see
Exercise 324), which is optically scanned by an AlGaAs laser, with wavelength approximately 0.8 µm, operating at a constant speed of about 1.25 m/s. The speed of rotation of
204
BCH and Reed–Solomon codes
the disc varies from approximately 8 rev/s for the inner portion of the track to 3.5 rev/s for
the outer portion. Along the track are depressions, called pits, and flat segments between
pits, called lands. The width of the track is 0.6 µm and the depth of a pit is 0.12 µm. The
laser light is reflected with differing intensities between pits and lands because of interference. The data carried by these pits and lands is subject to error due to such problems
as stray particles on the disc or embedded in the disc, air bubbles in the plastic coating,
fingerprints, or scratches. These errors tend to be burst errors; fortunately, there is a very
efficient encoding and decoding system involving both shortened Reed–Solomon codes and
interleaving.
5.6.1
Encoding
We first describe how audio data is encoded and placed on a CD. Sound-waves are first
converted from analog to digital using sampling. The amplitude of a waveform is sampled
at a given point in time and assigned a binary string of length 16. As before we will call a
binary digit 0 or 1 from F2 a bit. Because the sound is to be reproduced in stereo, there are
actually two samples taken at once, one for the left channel and one for the right. Waveform
sampling takes place at the rate of 44 100 pairs of samples per second (44.1 kHz). (The
sampling rate of 44.1 kHz was chosen to be compatible with a standard already existing
for video recording.) Thus each sample produces two binary vectors from F16
2 , one for each
16
channel. Each vector from F2 is cut in half and is used to represent an element of the field
F28 , which as before we call a byte. Each sample then produces four bytes of data. For every
second of sound recording, 44 100 · 32 = 1 411 200 bits or 44 100 · 4 = 176 400 bytes are
generated. We are now ready to encode the bytes. This requires the use of two shortened
Reed–Solomon codes, C 1 and C 2 , and two forms of interleaving. This combination is called
a cross-interleaved Reed–Solomon code or CIRC. The purpose of the cross-interleaving,
which is a variation of interleaving, is to break up long burst errors.
Step I: Encoding using C 1 and interleaving
The bytes are encoded in the following manner. Six samples of four bytes each are grouped
together to form a frame consisting of 24 bytes. We can view a frame as L 1 R1 L 2 R2 · · · L 6 R6 ,
where L i is two bytes for the left channel from the ith sample of the frame, and Ri is two
bytes for the right channel from the ith sample of the frame. Before any encoding is done,
the bytes are permuted in two ways. First, the odd-numbered samples L 1 R1 , L 3 R3 , and
L 2
L 4
L 6
L 5 R5 are grouped with the even-numbered samples
R2 ,
R4 , and
R6 taken from two
frames later. So we are now looking at a new frame of 24 bytes:
L 1 R1
L 2
R2 L 3 R3
L 4
R4 L 5 R5
L 6
R6 .
Thus samples that originally were consecutive in time are now two frames apart. Second,
these new frames are rearranged internally into 24 bytes by separating the odd-numbered
samples from the even-numbered samples to form
L 1 L 3 L 5 R1 R3 R5
L 2
L 4
L 6
R2
R4
R6 .
205
5.6 Coding for the compact disc
c1,1
0
0
0
c2,1
0
0
0
c3,1
0
0
0
c4,1
0
0
0
c5,1
c1,2
0
0
c6,1
c2,2
0
0
c7,1
c3,2
0
0
..
.
c8,1
c4,2
0
0
c9,1
c5,2
c1,3
0
c10,1
c6,2
c2,3
0
c11,1
c7,2
c3,3
0
c12,1
c8,2
c4,3
0
c13,1
c9,2
c5,3
c1,4
···
···
···
···
Figure 5.2 4-frame delay interleaving.
This separates samples as far apart as possible within the new frame. These permutations
of bytes allow for error concealment as we discuss later. This 24-byte message consisting
of a vector in F24
256 is encoded using a systematic encoder for a [28, 24, 5] shortened Reed–
Solomon code, which we denote C 1 . This encoder produces four bytes of redundancy, that
is, two pairs P1 and P2 each with two bytes of parity which are then placed in the middle
of the above to form
L 1 L 3 L 5 R1 R3 R5 P1 P2
L 2
L 4
L 6
R2
R4
R6 ,
further separating the odd-numbered samples from the even-numbered samples.
Thus from C 1 we produce a string of 28-byte codewords which we interleave to a depth of
28 using 4-frame delay interleaving as we now describe. Begin with codewords c1 , c2 , c3 , . . .
from C 1 in the order they are generated. Form an array with 28 rows and a large number of
columns in the following fashion. Row 1 consists of the first byte of c1 in column 1, the first
byte of c2 in column 2, the first byte of c3 in column 3, etc. Row 2 begins with four bytes
equal to 0 followed by the second byte of c1 in column 5, the second byte of c2 in column
6, the second byte of c3 in column 7, etc. Row 3 begins with eight bytes equal to 0 followed
by the third byte of c1 in column 9, the third byte of c2 in column 10, the third byte of c3
in column 11, etc. Continue in this manner filling out all 28 rows. If ci = ci,1 ci,2 · · · ci,28 ,
the resulting array begins as in Figure 5.2, and thus the original codewords are found going
diagonally down this array with slope −1/4. This array will be as long as necessary to
accommodate all the encoded frames of data. All the rows except row 28 will need to be
padded with zeros so that the array is rectangular; see Exercise 323.
Exercise 323 Suppose a CD is used to record 72 minutes of sound. How many frames
of data does this represent? How long is the array obtained by interleaving the codewords in C 1 corresponding to all these frames to a depth of 28 using 4-frame delay
interleaving?
Step II: Encoding using C 2 and interleaving
Each column of the array is a vector in F28
256 which is then encoded using a [32, 28, 5] shortened Reed–Solomon code C 2 . Thus we now have a list of codewords, which are generated
in the order of the columns, each consisting of 32 bytes. The codewords are regrouped with
the odd-numbered symbols of one codeword grouped with the even-numbered symbols of
the next codeword. This regrouping is another form of interleaving which further breaks
up short bursts that may still be present after the 4-frame delay interleaving. The regrouped
bytes are written consecutively in one long stream. We now re-divide this long string into
206
BCH and Reed–Solomon codes
segments of 32 bytes, with 16 bytes from one C 2 codeword and 16 bytes from another C 2
codeword because of the above regrouping. At the end of each of these segments a 33rd byte
is added which contains control and display information.1 Thus each frame of six samples
eventually leads to 33 bytes of data. A schematic of the encoding using C 1 and C 2 can be
found in [154].
Step III: Imprinting and EFM
Each byte of data must now be imprinted onto the disc. First, the bytes are converted to
strings of bits using EFM described shortly. Each bit is of length 0.3 µm when imprinted
along the track. Each land-to-pit or pit-to-land transition is imprinted with a single 1, while
the track along the pit or land is imprinted with a string of 0s whose number corresponds
to the length of the pit or land. For example, a pit of length 2.1 µm followed by a land of
length 1.2 µm corresponds to the string 10000001000. For technical reasons each land or
pit must be between 0.9 and 3.3 µm in length. Therefore each pair of 1s is separated by
at least two 0s and at most ten 0s. Thus the 256 possible bytes must be converted to bit
strings in such a way that this criterion is satisfied. Were it not for this condition, one could
convert the bytes to elements of F82 ; it turns out that the smallest string length such that
there are at least 256 different strings where each 1 is separated by at least two 0s but no
more than ten 0s is length 14. In fact there are 267 binary strings of length 14 satisfying
this condition; 11 of these are not used. This conversion from bytes to strings of length 14
is called EFM or eight-to-fourteen modulation. Note, however, that bytes must be encoded
in succession, and so two consecutive 14-bit strings may fail to satisfy our conditions
on minimum and maximum numbers of 0s between 1s. For example 10010000000100
and 00000000010001 are both allowable strings but if they follow one after the other, we
obtain
1001000000010000000000010001,
which has 11 consecutive 0s. To overcome this problem three additional bits called merge
bits are added to the end of each 14-bit string. In our example, if we add 001 to the end of
the first string, we have
1001000000010000100000000010001,
which satisfies our criterion. So our frame of six samples leads to 33 bytes each of which
is converted to 17 bits. Finally, at the end of these 33 · 17 bits, 24 synchronization bits plus
three merging bits are added; thus each frame of six samples leads to 588 bits.
Exercise 324 Do the following; see the related Exercise 323:
(a) How many bits are on the track of a CD with 72 minutes of sound?
(b) How long must the track be if each bit is 0.3 µm in length?
1
The control and display bytes include information for the listener such as playing time, composer, and title of
the piece, as well as technical information required by the CD player.
207
5.6 Coding for the compact disc
5.6.2
Decoding
We are now ready to see how decoding and error-correction is performed by the CD player.2
The process reverses the encoding.
Step I: Decoding with C 2
First, the synchronization bits, control and display bits, and merging bits are removed. Then
the remaining binary strings are converted from EFM form into byte form, a process called
demodulation, using table look-up; we now have our data as a stream of bytes. Next we
undo the scrambling done in the encoding process. The stream is divided into segments of
32 bytes. Each of these 32-byte segments contains odd-numbered bytes from one codeword
(with possible errors, of course) and even-numbered bytes from the next. The bytes in the
segments are regrouped to restore the positions in order and are passed on to the decoder
for C 2 . Note that if a short burst error had occurred on the disc, the burst may be split up into
shorter bursts by the regrouping. As C 2 is a [32, 28, 5] code over F256 , it can correct two
errors. However, it is only used to correct single errors or detect the presence of multiple
errors, including all errors of size two or three and some of larger size. The sphere of
radius 1 centered at some codeword c1 does not contain a vector that differs from another
codeword c2 in at most three positions as c1 and c2 are distance at least five from one another.
Therefore, if a single error has occurred, C 2 can correct that error; if two or three errors have
occurred, C 2 can detect the presence of those errors (but will not be used to correct them).
What is the probability that C 2 will fail to detect four or more errors when we use C 2 to
correct only single errors? Such a situation would arise if errors are made in one codeword
so that the resulting vector lies in a sphere of radius 1 about another codeword. Assuming
all vectors are equally likely, the probability of this occurring is approximately the ratio of
the total number of vectors inside spheres of radius 1 centered at codewords to the total
number of vectors in F32
256 . This ratio is
8161
25628 [1 + 32(256 − 1)]
=
≈ 1.9 × 10−6 .
25632
2564
By Exercise 325, if C 2 were used to its full error-correcting capability by correcting all
double errors, then the probability that three or more errors would go undetected is about
7.5 × 10−3 . The difference in these probabilities indicates why the full error-correcting
capability of C 2 is not used since the likelihood of three or more errors going undetected
(or being miscorrected) is much higher with full error-correction.
Exercise 325 Verify the following:
(a) A sphere of radius 2 centered at a codeword of a [32, 28] code over F256 contains
32
1 + 32(256 − 1) +
(256 − 1)2
2
vectors in F32
256 .
2
The encoding of a digital video disc (DVD) involves Reed–Solomon codes in a fashion similar
to the CD. Tom Høholdt has created a simulation of DVD decoding at the following web site:
http://www.mat.dtu.dk/persons/Hoeholdt Tom/.
208
BCH and Reed–Solomon codes
(b) The probability that three or more errors would go undetected using the double errorcorrecting capability of a [32, 28, 5] code over F256 is about 7.5 × 10−3 .
Step II: Decoding with C 1
If the decoder for C 2 determines that no errors in a 32-byte string are found, the 28byte message is extracted and passed on to the next stage. If the decoder for C 2 detects a
single error, the error is corrected and the 28-byte message is passed on. If the decoder
detects more than two errors, it passes on a 28-byte string with all components flagged
as erasures. These 28-byte strings correspond to the columns of the array in Figure 5.2,
possibly with erasures. The diagonals of slope −1/4 are passed on as 28-byte received
vectors to the decoder for C 1 . C 1 can be used in different ways. In one scheme it is used only
to correct erasures. By Theorem 1.11.6, C 1 can correct four erasures. Due to the 4-frame
delay interleaving and the ability of C 1 to correct four erasures, a burst error covering 16
consecutive 588-bit strings on the disc can be corrected. Such a burst is approximately 2.8
mm in length along the track! In another scheme, again applying Theorem 1.11.6, C 1 is
used to correct one error (which may have escaped the decoding performed by C 2 ) and two
erasures. A comparison of the two schemes can be found in [253].
Step III: Errors that still survive
It is possible that there are samples that cannot be corrected by the use of C 2 and C 1
but are detected as errors and hence remain erased. One technique used is to “conceal” the
error. Recall that consecutive samples are separated by two frames before any encoding was
performed. When the final decoding is completed and these samples are brought back to their
correct order, it is likely that the neighboring samples were correct or had been corrected.
If this is the case, then the erased sample is replaced by an approximation obtained by
linear interpolation using the two reliable samples on either side of the sample in error.
Listening tests have shown that this process is essentially undetectable. If the neighbors
are unreliable, implying a burst is still present, so that interpolation is not possible, then
“muting” is used. Starting 32 samples prior to the burst, the reliable samples are gradually
weakened until the burst occurs, the burst is replaced by a zero-valued sample, and the
next 32 reliable samples are gradually strengthened. As this muting process occurs over
a few milliseconds, it is essentially inaudible. Both linear interpolation and muting mask
“clicks” that may otherwise occur. More details on interpolation and muting can be found
in [154, 253].
6
Duadic codes
In Chapter 5 we described the family of cyclic codes called BCH codes and its subfamily of
RS codes. In this chapter we define and study another family of cyclic codes called duadic
codes. They are generalizations of quadratic residue codes, which we discuss in Section 6.6.
Binary duadic codes were initially defined in [190] and were later generalized to arbitrary
finite fields in [266, 270, 301, 315].
6.1
Definition and basic properties
We will define duadic codes in two different ways and show that the definitions are equivalent. We need some preliminary notation and results before we begin. Throughout this
chapter Zn will denote the ring of integers modulo n. We will also let E n denote the subcode of even-like vectors in Rn = Fq [x]/(x n − 1). The code E n is an [n, n − 1] cyclic code
whose dual code E ⊥
n is the repetition code of length n. By Exercise 221 the repetition code
has generating idempotent
j(x) =
1
(1 + x + x 2 + · · · + x n−1 ).
n
So by Theorem 4.4.9, E n has generating idempotent 1 − j(x)µ−1 = 1 − j(x). We summarize this information in the following lemma.
Lemma 6.1.1 The code E n has the following properties:
(i) E n is an [n, n − 1] cyclic code.
2
(ii) E ⊥
n is the repetition code with generating idempotent j(x) = (1/n)(1 + x + x + · · · +
n−1
x ).
(iii) E n has generating idempotent 1 − j(x).
In defining the duadic codes, we will obtain two pairs of codes; one pair will be two
even-like codes, which are thus subcodes of E n , and the other pair will be odd-like codes. It
will be important to be able to tell when either a vector or a cyclic code in Rn is even-like
or odd-like.
n−1
Lemma 6.1.2 Let a(x) = i=0
ai x i ∈ Rn . Also let C be a cyclic code in Rn with generator
polynomial g(x). Then:
(i) a(x) is even-like if and only if a(1) = 0 if and only if a(x) j(x) = 0,
(ii) a(x) is odd-like if and only if a(1) = 0 if and only if a(x) j(x) = α j(x) for some nonzero
α ∈ Fq ,
210
Duadic codes
(iii) C is even-like if and only if g(1) = 0 if and only if j(x) ∈
/ C, and
(iv) C is odd-like if and only if g(1) = 0 if and only if j(x) ∈ C.
Proof: Parts (ii) and (iv) follow from (i) and (iii), respectively. By definition, a(x) is evenn−1
like precisely when i=0
ai = 0. This is the same as saying a(1) = 0. That this is equivalent
to a(x) j(x) = 0 follows from Exercise 221. This verifies (i). In part (iii), C is even-like if
and only if g(1) = 0 from Exercise 238. Note that (x − 1)n j(x) = x n − 1 in Fq [x]. As
g(x) | (x n − 1) and x n − 1 has distinct roots, g(x) | j(x) if and only if g(1) = 0. Since
j(x) ∈ C if and only if g(x) | j(x), part (iii) follows.
We first define duadic codes in terms of their idempotents. Duadic codes come in two
pairs, one even-like pair, which we usually denote C 1 and C 2 , and one odd-like pair, usually
denoted D1 and D2 . Let e1 (x) and e2 (x) be two even-like idempotents with C 1 = e1 (x)
and C 2 = e2 (x). The codes C 1 and C 2 form a pair of even-like duadic codes provided the
following two criteria are met:
I. The idempotents satisfy
e1 (x) + e2 (x) = 1 − j(x), and
(6.1)
II. there is a multiplier µa such that
C 1 µa = C 2 and C 2 µa = C 1 .
(6.2)
If c(x) ∈ C i , then c(x)ei (x) = c(x) implying that c(1) = c(1)ei (1) = 0 by Lemma 6.1.2(i);
thus both C 1 and C 2 are indeed even-like codes. We remark that e1 (x)µa = e2 (x) and
e2 (x)µa = e1 (x) if and only if C 1 µa = C 2 and C 2 µa = C 1 by Theorem 4.3.13(i); thus we
can replace (6.2) in part II by
e1 (x)µa = e2 (x) and e2 (x)µa = e1 (x).
(6.3)
Associated to C 1 and C 2 is the pair of odd-like duadic codes
D1 = 1 − e2 (x) and D2 = 1 − e1 (x).
(6.4)
As 1 − ei (1) = 1, by Lemma 6.1.2(ii), D1 and D2 are odd-like codes. We say that µa gives a
splitting for the even-like duadic codes C 1 and C 2 or for the odd-like duadic codes D1 and D2 .
Exercise 326 Prove that if C 1 and C 2 form a pair of even-like duadic codes and that C 1
and C ′2 are also a pair of even-like duadic codes, then C 2 = C ′2 . (This exercise shows that
if we begin with a code C 1 that is one code in a pair of even-like duadic codes, there is no
ambiguity as to what code it is paired with.)
The following theorem gives basic facts about these four codes.
Theorem 6.1.3 Let C 1 = e1 (x) and C 2 = e2 (x) be a pair of even-like duadic codes of
length n over Fq . Suppose µa gives the splitting for C 1 and C 2 . Let D1 and D2 be the
associated odd-like duadic codes. Then:
(i) e1 (x)e2 (x) = 0,
(ii) C 1 ∩ C 2 = {0} and C 1 + C 2 = E n ,
(iii) n is odd and C 1 and C 2 each have dimension (n − 1)/2,
211
6.1 Definition and basic properties
(iv)
(v)
(vi)
(vii)
(viii)
(ix)
D1 is the cyclic complement of C 2 and D2 is the cyclic complement of C 1 ,
D1 and D2 each have dimension (n + 1)/2,
C i is the even-like subcode of Di for i = 1, 2,
D1 µa = D2 and D2 µa = D1 ,
D1 ∩ D2 = j(x) and D1 + D2 = Rn , and
Di = C i + j(x) = j(x) + ei (x) for i = 1, 2.
Proof: Multiplying (6.1) by e1 (x) gives e1 (x)e2 (x) = 0, by Lemma 6.1.2(i). So (i) holds.
By Theorem 4.3.7, C 1 ∩ C 2 and C 1 + C 2 have generating idempotents e1 (x)e2 (x) = 0 and
e1 (x) + e2 (x) − e1 (x)e2 (x) = e1 (x) + e2 (x) = 1 − j(x), respectively. Thus part (ii) holds
by Lemma 6.1.1(iii). By (6.2), C 1 and C 2 are equivalent, and hence have the same dimension.
By (ii) and Lemma 6.1.1(i), this dimension is (n − 1)/2, and hence n is odd giving (iii).
The cyclic complement of C i has generating idempotent 1 − ei (x) by Theorem 4.4.6(i);
thus part (iv) is immediate from the definition of Di . Part (v) follows from the definition of cyclic complement and parts (iii) and (iv). As D1 is odd-like with generating
idempotent 1 − e2 (x), by Exercise 257, the generating idempotent of the even-like subcode of D1 is 1 − e2 (x) − j(x) = e1 (x). Thus C 1 is the even-like subcode of D1 ;
analogously C 2 is the even-like subcode of D2 yielding (vi). The generating idempotent of
D1 µa is (1 − e2 (x))µa = 1 − e2 (x)µa = 1 − e1 (x) by Theorem 4.3.13(i) and (6.3). Thus
D1 µa = D2 ; analogously D2 µa = D1 producing (vii). By Theorem 4.3.7, D1 ∩ D2 and
D1 + D2 have generating idempotents (1 − e2 (x))(1 − e1 (x)) = 1 − e1 (x) − e2 (x) = j(x)
and (1 − e2 (x)) + (1 − e1 (x)) − (1 − e2 (x))(1 − e1 (x)) = 1, respectively, as e1 (x)e2 (x) =
0. Thus (viii) holds as the generating idempotent of Rn is 1. Finally by (iii), (v), and
(vi), C i is a subspace of Di of codimension 1; as j(x) ∈ Di \ C i , Di = C i + j(x). Also
Di = j(x) + ei (x) by (6.1) and (6.4).
Example 6.1.4 We illustrate the definition of duadic codes by constructing the generating
idempotents of the binary duadic codes of length 7. The 2-cyclotomic cosets modulo 7
are C0 = {0}, C1 = {1, 2, 4}, and C3 = {3, 6, 5}. Recall from Corollary 4.3.15 that every
binary idempotent in Rn is of the form e(x) = j∈J i∈C j x i and all such polynomials
are idempotents. Thus there are 23 = 8 idempotents, with four being even-like. These
are e0 (x) = 0, e1 (x) = 1 + x + x 2 + x 4 , e2 (x) = 1 + x 3 + x 5 + x 6 , and e3 (x) = x + x 2 +
x 3 + x 4 + x 5 + x 6 . But e0 (x) generates {0} and e3 (x) generates E 7 ; see Exercise 327. So
the only possible generating idempotents for even-like duadic codes are e1 (x) and e2 (x).
Note that e1 (x) + e2 (x) = 1 − j(x) giving (6.1); also e1 (x)µ3 = e2 (x) and e2 (x)µ3 = e1 (x)
giving (6.3). Thus there is one even-like pair of duadic codes of length 7 with one associated
odd-like pair having generating idempotents 1 − e2 (x) = x 3 + x 5 + x 6 and 1 − e1 (x) =
x + x 2 + x 4 ; the latter are Hamming codes.
Exercise 327 Prove that x + x 2 + · · · + x n−1 is the idempotent of the even-like code E n
of length n over F2 .
Exercise 328 Find idempotent generators of all of the binary even-like and odd-like duadic
codes of length n = 17 and n = 23. The odd-like duadic codes of length 23 are Golay
codes.
212
Duadic codes
Exercise 329 We could have defined duadic codes by beginning with the generating
idempotents of the odd-like duadic codes. Let D1 and D2 be odd-like cyclic codes of length
n over Fq with generating idempotents d1 (x) and d2 (x). Show that D1 and D2 are odd-like
duadic codes if and only if:
I′′ . the idempotents satisfy d1 (x) + d2 (x) = 1 + j(x), and
II′′ . there is a multiplier µa such that D1 µa = D2 and D2 µa = D1 .
Duadic codes can also be defined in terms of their defining sets (and thus ultimately by
their generator polynomials). Let C 1 and C 2 be a pair of even-like duadic codes defined by
I and II above. As these are cyclic codes, C 1 and C 2 have defining sets T1 = {0} ∪ S1 and
T2 = {0} ∪ S2 , respectively, relative to some primitive nth root of unity. Each of the sets S1
and S2 is a union of nonzero q-cyclotomic cosets. By Theorem 6.1.3(iii) C 1 and C 2 each have
dimension (n − 1)/2; by Theorem 4.4.2, S1 and S2 each have size (n − 1)/2. The defining
set of C 1 ∩ C 2 = {0} is T1 ∪ T2 , which must then be {0,1, . . . , n − 1} by Exercise 239. Thus
S1 ∪ S2 = {1, 2, . . . , n − 1}; since each Si has size (n − 1)/2, S1 ∩ S2 = ∅. By (6.2) and
Corollary 4.4.5, T1 µa −1 = T2 and T2 µa −1 = T1 . Therefore S1 µa −1 = S2 and S2 µa −1 = S1 .
This leads to half of the following theorem.
Theorem 6.1.5 Let C 1 and C 2 be cyclic codes over Fq with defining sets T1 = {0} ∪ S1 and
T2 = {0} ∪ S2 , respectively, where 0 ∈ S1 and 0 ∈ S2 . Then C 1 and C 2 are a pair of even-like
duadic codes if and only if:
I′ . S1 and S2 satisfy
S1 ∪ S2 = {1, 2, . . . , n − 1} and S1 ∩ S2 = ∅, and
(6.5)
II′ . there is a multiplier µb such that
S1 µb = S2 and S2 µb = S1 .
(6.6)
Proof: The previous discussion proved that if C 1 and C 2 are a pair of even-like duadic
codes, then I′ and II′ hold. Suppose that I′ and II′ hold. Because 0 ∈ Ti , C i is even-like, by
Exercise 238, for i = 1 and 2. Let ei (x) be the generating idempotent of C i . As C 1 ∩ C 2
has defining set T1 ∪ T2 = {0, 1, . . . , n − 1} by Exercise 239 and (6.5), C 1 ∩ C 2 = {0}.
By Theorem 4.3.7, C 1 ∩ C 2 has generating idempotent e1 (x)e2 (x), which therefore must
be 0. As C 1 + C 2 has defining set T1 ∩ T2 = {0} by Exercise 239 and (6.5), C 1 + C 2 =
E n . By Theorem 4.3.7, C 1 + C 2 has generating idempotent e1 (x) + e2 (x) − e1 (x)e2 (x) =
e1 (x) + e2 (x), which therefore must be 1 − j(x) by Lemma 6.1.1(iii). Thus (6.1) holds. By
Corollary 4.4.5, C i µb−1 has defining set Ti µb for i = 1 and 2. But by (6.6), T1 µb = T2 and
T2 µb = T1 . Thus C 1 µb−1 = C 2 and C 2 µb−1 = C 1 , giving (6.2) with a = b−1 . Therefore C 1
and C 2 are a pair of even-like duadic codes.
We can use either our original definitions I and II for duadic codes defined in terms of
their idempotents or I′ and II′ from Theorem 6.1.5. We give a name to conditions I′ and II′ :
we say that a pair of sets S1 and S2 , each of which is a union of nonzero q-cyclotomic cosets,
forms a splitting of n given by µb over Fq provided conditions I′ and II′ from Theorem 6.1.5
hold. Note that the proof of Theorem 6.1.5 shows that µa in (6.2) and µb in (6.6) are
related by a = b−1 . In other words, if µa gives a splitting for the duadic codes, then µa −1
213
6.1 Definition and basic properties
gives the associated splitting of n. However, S1 µa −1 = S2 implies S1 = S2 (µa −1 )−1 = S2 µa .
Similarly, S2 = S1 µa , and we can in fact use the same multiplier for the splittings in either
definition.
This theorem has an immediate corollary.
Corollary 6.1.6 Duadic codes of length n over Fq exist if and only if there is a multiplier
which gives a splitting of n.
Example 6.1.7 We construct the generating idempotents of the duadic codes of length 11
over F3 . We first use the splittings of 11 over F3 to show that there is only one pair of evenlike duadic codes. The 3-cyclotomic cosets modulo 11 are C0 = {0}, C1 = {1, 3, 9, 5, 4},
and C2 = {2, 6, 7, 10, 8}. The only possible splitting of 11 is S1 = C1 and S2 = C2 , since
S1 and S2 must contain five elements. Thus there is only one pair of even-like duadic
codes. We now construct their idempotents. Let i 0 (x) = 1, i 1 (x) = x + x 3 + x 4 + x 5 + x 9 ,
and i 2 (x) = x 2 + x 6 + x 7 + x 8 + x 10 . By Corollary 4.3.15, all idempotents are of the form
a0 i 0 (x) + a1 i 1 (x) + a2 i 2 (x), where a0 , a1 , a2 ∈ F3 . By Exercise 330, i 1 (x)2 = −i 1 (x). Thus
(1 + i 1 (x))2 = 1 + 2i 1 (x) − i 1 (x) = 1 + i 1 (x) and 1 + i 1 (x) is an even-like idempotent.
As i 1 (x)µ2 = i 2 (x), then i 2 (x)2 = −i 2 (x) and 1 + i 2 (x) is another even-like idempotent.
Letting e1 (x) = 1 + i 1 (x) and e2 (x) = 1 + i 2 (x), we see that e1 (x) + e2 (x) = 1 − j(x) as
j(x) = 2(1 + x + x 2 + · · · + x 10 ), giving (6.1). Also e1 (x)µ2 = e2 (x) and e2 (x)µ2 = e1 (x)
giving (6.3). Thus e1 (x) and e2 (x) are the idempotent generators of the unique pair of evenlike duadic codes. The corresponding generating idempotents for the odd-like duadic codes
are 1 − e2 (x) = −i 2 (x) and 1 − e1 (x) = −i 1 (x). These odd-like codes are the ternary Golay
codes.
Exercise 330 In R11 over F3 show that (x + x 3 + x 4 + x 5 + x 9 )2 = −(x + x 3 + x 4 +
x 5 + x 9 ).
Exercise 331 Find the generating idempotents of the duadic codes of length n = 23 over
F3 .
Exercise 332 Find the splittings of n = 13 over F3 to determine the number of pairs
of even-like duadic codes. Using these splittings, find those codes which are permutation
equivalent using Theorem 4.3.17 and Corollary 4.4.5.
Example 6.1.8 We construct the generating idempotents of the duadic codes of length 5
over F4 . The 4-cyclotomic cosets modulo 5 are C0 = {0}, C1 = {1, 4}, and C2 = {2, 3}.
The only possible splitting of 5 over F4 is S1 = C1 and S2 = C2 , since S1 and S2 must
contain two elements. Thus there is only one pair of even-like duadic codes. Let e1 (x)
and e2 (x) be their idempotents; let i 0 (x) = 1, i 1 (x) = x + x 4 , and i 2 (x) = x 2 + x 3 . By
Corollary 4.3.15, all idempotents are of the form e(x) = a0 i 0 (x) + a1 i 1 (x) + a2 i 2 (x), where
a0 , a1 , a2 ∈ F4 . Since e(x)2 = a02 i 0 (x) + a22 i 1 (x) + a12 i 2 (x) = e(x), we must have a02 = a0
and a2 = a12 ; thus a0 = 0 or 1. Since µ4 fixes cyclic codes over F4 by Theorem 4.3.13, the
only multipliers that could interchange two cyclic codes are µ2 or µ3 . Since µ3 = µ4 µ2 ,
we can assume that µ2 interchanges the two even-like duadic codes. Suppose that
e1 (x) = a0 i 0 (x) + a1 i 1 (x) + a12 i 2 (x). Then e2 (x) = e1 (x)µ2 = a0 i 0 (x) + a12 i 1 (x) + a1 i 2 (x)
214
Duadic codes
and e1 (x) + e2 (x) = 1 − j(x) = i 1 (x) + i 2 (x); thus a1 + a12 = 1 implying that a1 = ω or
ω. To make ei (x) even-like, a0 = 0. So we can take the idempotents of C 1 and C 2 to
be e1 (x) = ω(x + x 4 ) + ω(x 2 + x 3 ) and e2 (x) = ω(x + x 4 ) + ω(x 2 + x 3 ). The associated
odd-like duadic codes have idempotents 1 + ω(x + x 4 ) + ω(x 2 + x 3 ) and 1 + ω(x + x 4 ) +
ω(x 2 + x 3 ); these codes are each the punctured hexacode (see Exercise 363).
Example 6.1.9 For comparison with the previous example, we construct the generating
idempotents of the duadic codes of length 7 over F4 . The codes in this example all
turn out to be quadratic residue codes. The 4-cyclotomic cosets modulo 7 are C0 = {0},
C1 = {1, 4, 2}, and C3 = {3, 5, 6}. Again the only possible splitting of 7 over F4 is
S1 = C1 and S2 = C3 , and there is only one pair of even-like duadic codes with idempotents e1 (x) and e2 (x). Let i 0 (x) = 1, i 1 (x) = x + x 2 + x 4 , and i 2 (x) = x 3 + x 5 + x 6 . As in
the previous example all idempotents are of the form e(x) = a0 i 0 (x) + a1 i 1 (x) + a2 i 2 (x),
where a0 , a1 , a2 ∈ F4 . However, now e(x)2 = a02 i 0 (x) + a12 i 1 (x) + a22 i 2 (x) = e(x); we must
have a 2j = a j for 0 ≤ j ≤ 2 and hence a j = 0 or 1. Similarly to Example 6.1.8 we can
assume that µ3 interchanges the two even-like duadic codes; see also Exercise 333. If
e1 (x) = a0 i 0 (x) + a1 i 1 (x) + a2 i 2 (x), then e2 (x) = e1 (x)µ3 = a0 i 0 (x) + a2 i 1 (x) + a1 i 2 (x)
and e1 (x) + e2 (x) = 1 − j(x) = i 1 (x) + i 2 (x). Thus a1 + a2 = 1 implying that {a1 , a2 } =
{0, 1}. To make ei (x) even-like, a0 = 1. So the idempotents of C 1 and C 2 are 1 + x + x 2 + x 4
and 1 + x 3 + x 5 + x 6 . The associated odd-like duadic codes have idempotents x 3 + x 5 + x 6
and x + x 2 + x 4 . These are all binary idempotents; the subfield subcodes C i |F2 and Di |F2 are
precisely the codes from Example 6.1.4. This is an illustration of Theorem 6.6.4, which
applies to quadratic residue codes.
The definition of duadic codes in terms of generating idempotents has the advantage that
these idempotents can be constructed with the knowledge gained from the splittings of n
over the fields F2 and F4 without factoring x n − 1 as we saw in the preceding examples.
With more difficulty, the generating idempotents can also sometimes be constructed over
the field F3 without factoring x n − 1.
Exercise 333 In this exercise, we examine which multipliers need to be checked to either
interchange duadic codes (as in II of the duadic code definition) or produce a splitting (as
in II′ of Theorem 6.1.5). Suppose our codes are of length n over Fq , where of course, n
and q are relatively prime. The multipliers µa that need to be considered are indexed by
the elements of the (multiplicative) group Z#n = {a ∈ Zn | gcd(a, n) = 1}. Let Q be the
subgroup of Z#n generated by q. Prove the following:
(a) If C 1 µa = C 2 and C 2 µa = C 1 (from II of the duadic code definition), then C 1 µc µa = C 2
and C 2 µc µa = C 1 for all c ∈ Q.
(b) If S1 µb = S2 and S2 µb = S1 (from II′ of Theorem 6.1.5), then S1 µc µb = S2 and
S2 µc µb = S1 for all c ∈ Q.
(c) Prove that when checking II of the duadic code definition or II′ of Theorem 6.1.5, one
only needs to check multipliers indexed by one representative from each coset of Q in
Z#n , and, in fact, the representative from Q itself need not be checked.
(d) What are the only multipliers that need to be considered when constructing duadic
codes of length n over Fq where:
215
6.1 Definition and basic properties
(i)
(ii)
(iii)
(iv)
(v)
n
n
n
n
n
= 5, q = 4,
= 7, q = 4,
= 15, q = 4,
= 13, q = 3, and
= 23, q = 2?
Exercise 334 Find the generating idempotents of the duadic codes of length n over F4
where:
(a) n = 3,
(b) n = 9,
(c) n = 11,
(d) n = 13, and
(e) n = 23.
If n is the length of our duadic codes, a splitting of n leads directly to the defining sets
of the even-like and odd-like duadic codes, and hence the generator polynomials, once the
primitive nth root of unity has been fixed. From there one can construct the generating
idempotents, by using, for example, the Euclidean Algorithm and the technique in the proof
of Theorem 4.3.2. In the binary case, by examining the exponents of two of these four
idempotents another splitting of n is obtained as the next theorem shows. This provides
a way to use splittings of n to obtain generating idempotents of duadic codes directly in
the binary case. It is important to note that the splitting of n used to construct the defining
sets and generator polynomials is not necessarily the same splitting as the one arising from
the exponents of the idempotents; however, the multiplier used to give these splittings is
the same. In Theorem 6.3.3, we will see that binary duadic codes exist only if n ≡ ±1
(mod 8).
Theorem 6.1.10 Let n ≡ ±1 (mod 8) and let e1 (x) and e2 (x) be generating idempotents of
even-like binary duadic codes of length n given by the multiplier µa . The following hold:
(i) If n ≡ 1 (mod 8), then ei (x) = j∈Si x j , where S1 and S2 form a splitting of n given
by µa .
(ii) If n ≡ −1 (mod 8), then 1 + ei (x) = j∈Si x j , where S1 and S2 form a splitting of n
given by µa .
Proof: Since µa is the multiplier giving the splitting of (6.2), from (6.3) e1 (x)µa = e2 (x)
and e2 (x)µa = e1 (x). Then by (6.1),
n−1
e1 (x) + e2 (x) =
x j.
(6.7)
j=1
By Corollary 4.3.15
ei (x) = ǫi +
x j,
j∈Si
(6.8)
216
Duadic codes
where ǫi is 0 or 1 and Si is a union of nonzero 2-cyclotomic cosets modulo n. Combining
(6.7) and (6.8), we have
n−1
ǫ1 + ǫ2 +
j∈S1
xj +
j∈S2
xj =
x j,
j=1
which implies that ǫ1 = ǫ2 , S1 ∩ S2 = ∅, and S1 ∪ S2 = {1, 2, . . . , n − 1}. Furthermore
as e1 (x)µa = e2 (x), (ǫ1 + j∈S1 x j )µa = ǫ1 + j∈S1 x ja = ǫ2 + j∈S2 x j implying that
S1 µa = S2 . Analogously, S2 µa = S1 using e2 (x)µa = e1 (x). Thus S1 and S2 is a splitting
of n given by µa . Now e1 (x) is even-like; which in the binary case, means that e1 (x)
has even weight. The weight of e1 (x) is the size of S1 , if ǫ1 = 0, and is one plus the size
of S1 , if ǫ1 = 1. The size of S1 is (n − 1)/2, which is even if n ≡ 1 (mod 8) and odd if
n ≡ −1 (mod 8). So ǫ1 = 0 if n ≡ 1 (mod 8) and ǫ1 = 1 if n ≡ −1 (mod 8), leading to (i)
and (ii).
The converse also holds.
Theorem 6.1.11 Let n ≡ ±1 (mod 8) and let S1 and S2 be a splitting of n over F2 given
by µa . The following hold:
(i) If n ≡ 1 (mod 8), then ei (x) = j∈Si x j with i = 1 and 2 are generating idempotents
of an even-like pair of binary duadic codes with splitting given by µa .
(ii) If n ≡ −1 (mod 8), then ei (x) = 1 + j∈Si x j with i = 1 and 2 are generating idempotents of an even-like pair of binary duadic codes with splitting given by µa .
Proof: Define ei (x) = ǫ + j∈Si x j for i = 1 and 2, where ǫ = 0 if n ≡ 1 (mod 8) and
ǫ = 1 if n ≡ −1 (mod 8). Then as Si has size (n − 1)/2, ei (x) has even weight and hence
is even-like. Furthermore, ei (x) is an idempotent as {0} ∪ Si is a union of 2-cyclotomic
cosets, by Corollary 4.3.15. As S1 µa = S2 and S2 µa = S1 , e1 (x)µa = e2 (x) and e2 (x)µa =
e1 (x) showing (6.3), which is equivalent to (6.2). Because S1 ∪ S2 = {1, 2, . . . , n − 1} and
n−1 j
j
S1 ∩ S2 = ∅, e1 (x) + e2 (x) = 2ǫ + n−1
j=1 x giving (6.1).
j=1 x =
Exercise 335 Use Theorems 6.1.10 and 6.1.11 to find all the idempotents of the even-like
binary duadic codes of length 73 where µa = µ−1 .
The proof of the following theorem is straightforward and is left as an exercise.
Theorem 6.1.12 Let µc be any multiplier. The pairs C 1 , C 2 , and D1 , D2 are associated
pairs of even-like and odd-like duadic codes with splitting S1 , S2 if and only if C 1 µc , C 2 µc ,
and D1 µc , D2 µc are associated pairs of even-like and odd-like duadic codes with splitting
S1 µc−1 , S2 µc−1 .
Exercise 336 Prove Theorem 6.1.12.
As we will see in Section 6.6, quadratic residue codes are duadic codes of prime length
n = p for which S1 is the set of nonzero quadratic residues (that is, squares) in Fq , S2 is the
set of nonresidues (that is, nonsquares), and the multiplier interchanging the codes is µa ,
where a is any nonresidue. Quadratic residue codes exist only for prime lengths, but there
are duadic codes of composite length. At prime lengths there may be duadic codes that
217
Y
L
6.2 A bit of number theory
F
are not quadratic residue codes. Duadic codes possess many of the properties of quadratic
residue codes. For example, all the duadic codes of Examples 6.1.4, 6.1.7, 6.1.8, and 6.1.9
and Exercises 328 and 331 are quadratic residue codes. In Exercises 332 and 334, some of
the codes are quadratic residue codes and some are not.
m
a
e
T
6.2
A bit of number theory
In this section we digress to present some results in number theory that we will need in the
remainder of this chapter. We begin with a general result that will be used in this section
and the next. Its proof can be found in [195, Theorem 4–11]. In a ring with unity, a unit is
an element with a multiplicative inverse. The units in a ring with unity form a group.
Lemma 6.2.1 Let n > 1 be an integer and let Un be the group of units in Zn . Then:
(i) Un has order φ(n), where φ is the Euler totient first described in Section 3.3.
(ii) Un is cyclic if and only if n = 2, 4, p t , or 2 p t , where p is an odd prime.
Note that the units in Un are often called reduced residues modulo n, and if Un is cyclic, its
generator is called a primitive root. We will need to know when certain numbers are squares
modulo an odd prime p.
Lemma 6.2.2 Let p be an odd prime and let a be in Z p with a ≡ 0 (mod p). The following
are equivalent:
(i) a is a square.
(ii) The (multiplicative) order of a is a divisor of ( p − 1)/2.
(iii) a ( p−1)/2 ≡ 1 (mod p).
Furthermore, if a is not a square, then a ( p−1)/2 ≡ −1 (mod p).
Proof: Since Z p is the field F p , the group Z∗p of nonzero elements of Z p is a cyclic
group by Theorem 3.3.1 (or Lemma 6.2.1); let α be a generator of this cyclic group. The
nonzero squares in Z p form a multiplicative group Q with generator α 2 , an element of order
( p − 1)/2. Thus the nonzero squares have orders a divisor of ( p − 1)/2 showing that (i)
implies (ii). Suppose that a has order d a divisor of ( p − 1)/2. So a d ≡ 1 (mod p) and if
m = ( p − 1)/2, then 1 ≡ (a d )m/d ≡ a ( p−1)/2 (mod p); hence (ii) implies (iii). If (iii) holds,
then a has order a divisor d of ( p − 1)/2. Therefore as there is a unique subgroup of order
d in any cyclic group and Q contains such a subgroup, a ∈ Q. Thus (iii) implies (i). Since
x 2 ≡ 1 (mod p) has only two solutions ±1 in a field of odd characteristic and (a ( p−1)/2 )2 ≡ 1
(mod p), if a is not a square, then a ( p−1)/2 ≡ −1 (mod p) by (iii).
By this lemma, to check if a is a square modulo p, one only checks whether or not
a ( p−1)/2 is 1 or −1. We now show that if a is a square modulo an odd prime p, then it is a
square modulo any prime power p t , where t > 0.
Lemma 6.2.3 Let p be an odd prime and t a positive integer. Then a is a square modulo p
if and only if a is a square modulo p t .
218
Duadic codes
Proof: If a ≡ b2 (mod p t ), then a ≡ b2 (mod p). For the converse, suppose that a is a
square modulo p. By Lemma 6.2.2, a ( p−1)/2 ≡ 1 (mod p). By Exercise 337
- pt−1
,
t−1
≡ 1 (mod p t ).
(6.9)
a ( p−1) p /2 ≡ a ( p−1)/2
Let U be the units in Z pt . This group is cyclic of order φ( p t ) = ( p − 1) p t−1 by Lemma 6.2.1.
In particular, U has even order as p is odd. Let β be a generator of U . The set of squares
in U is a subgroup R of order φ( p t )/2 generated by β 2 . This is the unique subgroup of
that order and contains all elements of orders dividing φ( p t )/2. Therefore by (6.9), a ∈ R,
completing the proof.
Exercise 337 Prove that if p is a prime, x is an integer, and t is a positive integer, then
t−1
(1 + x p) p = 1 + yp t for some integer y.
We will be interested in the odd primes for which −1, 2, and 3 are squares modulo p.
Lemma 6.2.4 Let p be an odd prime. Then −1 is a square modulo p if and only if p ≡ 1
(mod 4).
Proof: As p is odd, p ≡ ±1 (mod 4). Suppose that p = 4r + 1, where r is an integer. Then
(−1)( p−1)/2 = (−1)2r = 1; by Lemma 6.2.2, −1 is a square modulo p. Suppose now that
p = 4r − 1, where r is an integer. Then (−1)( p−1)/2 = (−1)2r −1 = −1; by Lemma 6.2.2,
−1 is not a square modulo p.
Lemma 6.2.5 Let p be an odd prime. Then 2 is a square modulo p if and only if p ≡ ±1
(mod 8).
Proof: We prove only the case p = 8r + 1, where r is an integer, and leave the cases
p = 8r − 1 and p = 8r ± 3 as exercises. Let b = 2 · 4 · 6 · · · 8r = 24r (4r )!. Then
b = 2 · 4 · 6 · · · 4r · [ p − (4r − 1)][ p − (4r − 3)] · · · ( p − 1).
Considering b modulo p, we obtain 24r (4r )! ≡ b ≡ (−1)2r (4r )! (mod p). Thus 2( p−1)/2 =
24r ≡ (−1)2r ≡ 1 (mod p). By Lemma 6.2.2, 2 is a square modulo p.
Exercise 338 Complete the proof of Lemma 6.2.5 by examining the cases p = 8r − 1 and
p = 8r ± 3, where r is an integer.
We will need to know information about ord p (2) for later work. Using Lemma 6.2.5, we
obtain the following result.
Lemma 6.2.6 Let p be an odd prime. The following hold:
(i) If p ≡ −1 (mod 8), then ord p (2) is odd.
(ii) If p ≡ 3 (mod 8), then 2 | ord p (2) but 4 ∤ ord p (2).
(iii) If p ≡ −3 (mod 8), then 4 | ord p (2) but 8 ∤ ord p (2).
Proof: For (i), let p = 8r − 1, where r is an integer. By Lemmas 6.2.2 and 6.2.5, 2( p−1)/2 =
24r −1 ≡ 1 (mod p). So ord p (2) is a divisor of 4r − 1, which is odd. Part (i) follows. For (ii),
let p = 8r + 3, where r is an integer. By Lemmas 6.2.2 and 6.2.5, 2( p−1)/2 = 24r +1 ≡ −1
(mod p). So as ord p (2) is always a divisor of p − 1 = 8r + 2 = 2(4r + 1) but not a divisor
of 4r + 1, 2 | ord p (2); since 4 ∤ ( p − 1), 4 ∤ ord p (2) yielding part (ii). Finally for (iii),
219
6.2 A bit of number theory
let p = 8r − 3, where r is an integer. By Lemmas 6.2.2 and 6.2.5, 2( p−1)/2 = 24r −2 ≡
−1 (mod p). So as ord p (2) is always a divisor of p − 1 = 8r − 4 = 4(2r − 1) but not
a divisor of 4r − 2 = 2(2r − 1), 4 | ord p (2); since 8 ∤ ( p − 1), 8 ∤ ord p (2), showing (iii)
holds.
Note that Lemma 6.2.6 does not address the case p ≡ 1 (mod 8) because ord p (2) can
have various powers of 2 as a factor depending on p.
Corollary 6.2.7 Let p be an odd prime. The following hold:
(i) If p ≡ −1 (mod 8), then ord p (4) = ord p (2) and hence is odd.
(ii) If p ≡ 3 (mod 8), then ord p (4) is odd.
(iii) If p ≡ −3 (mod 8), then 2 | ord p (4) but 4 ∤ ord p (4).
Exercise 339 Prove Corollary 6.2.7.
Exercise 340 Compute ord p (2) and ord p (4) when 3 ≤ p < 100 and p is prime. Compare
the results to those implied by Lemma 6.2.6 and Corollary 6.2.7.
In order to discover when 3 is a square modulo p, it is simplest to use quadratic reciprocity.
To do that we define the Legendre symbol
a
1 if a is a nonzero square modulo p
=
−1 if a is a nonsquare modulo p.
p
By Lemma 6.2.2 (a/ p) ≡ a ( p−1)/2 (mod p). The Law of Quadratic Reciprocity, a proof
of which can be found in [195], allows someone to decide if q is a square modulo p by
determining if p is a square modulo q whenever p and q are distinct odd primes.
Theorem 6.2.8 (Law of Quadratic Reciprocity) Suppose p and q are distinct odd
primes. Then
p−1 q−1
p
q
= (−1) 2 2 .
p
q
Lemma 6.2.9 Let p = 3 be an odd prime. Then 3 is a square modulo p if and only if
p ≡ ±1 (mod 12).
Proof: We wish to find 3p . As p is an odd prime, p = 12r ± 1 or p = 12r ± 5, where r
is an integer. Suppose that p = 12r + 1. Then p ≡ 1 ≡ 12 (mod 3) and so 3p = 1. Also,
(−1)
p−1 3−1
2
2
= (−1)6r = 1.
By the Law of Quadratic Reciprocity,
3
p
= 1. The other cases are left as an exercise.
Exercise 341 Complete the proof of Lemma 6.2.9 by examining the cases p = 12r − 1
and p = 12r ± 5, where r is an integer.
Exercise 342 Make a table indicating when −1, 2, and 3 are squares modulo p, where p
is a prime with 3 < p < 100.
220
Duadic codes
Exercise 343 Show that if p is an odd prime, then
p 2 −1
2
= (−1) 8 .
p
6.3
Existence of duadic codes
The existence of duadic codes for a given length n depends on the existence of a splitting
of n by Corollary 6.1.6. In this section we determine, for each q, the integers n for which
splittings exist. The following lemma reduces the problem to the case where n is a prime
power; we sketch the proof leaving the details to Exercise 344.
Lemma 6.3.1 Let n = n 1 n 2 where gcd(n 1 , n 2 ) = 1. There is a splitting of n given by µa
if and only if there are splittings of n 1 and n 2 given by µa modn 1 and µa modn 2 , respectively.
Furthermore, q is a square modulo n if and only if q is a square modulo n 1 and a square
modulo n 2 .
Proof: Since gcd(n 1 , n 2 ) = 1, it follows from the Chinese Remainder Theorem [195] that
zθ = (z mod n 1 , z mod n 2 ) defines a ring isomorphism θ from Zn onto Zn 1 × Zn 2 . The
second assertion follows from this observation.
For the first assertion, let Z 1 = {(z, 0) | z ∈ Zn 1 } and Z 2 = {(0, z) | z ∈ Zn 2 }. For i = 1
and 2, the projections πi : Z i → Zni are ring isomorphisms. If C is a q-cyclotomic coset
modulo n and Cθ ∩ Z i = ∅, (Cθ ∩ Z i )πi is a q-cyclotomic coset modulo n i . Let S1 and S2
form a splitting of n given by µb . Then (S1 θ ∩ Z i )πi and (S2 θ ∩ Z i )πi form a splitting of
n i given by µbi where bi ≡ b (mod n i ).
Conversely, let S1,i and S2,i form a splitting of n i given by µbi for i = 1 and 2. Then
((S1,1 × Zn 2 ) ∪ ({0} × S1,2 ))θ −1 and ((S2,1 × Zn 1 ) ∪ ({0} × S2,2 ))θ −1 form a splitting of n
given by µb where b = (b1 , b2 )θ −1 .
Exercise 344 In this exercise, fill in some of the details of Lemma 6.3.1 by doing the
following (where n 1 and n 2 are relatively prime and n = n 1 n 2 ):
(a) Prove that zθ = (z mod n 1 , z mod n 2 ) defines a ring isomorphism θ from Zn onto
Zn 1 × Zn 2 .
(b) Prove that if C is a q-cyclotomic coset modulo n and Cθ ∩ Z i = ∅, then (Cθ ∩ Z i )πi
is a q-cyclotomic coset modulo n i .
(c) Prove that if S1 and S2 form a splitting of n given by µb , then (S1 θ ∩ Z i )πi and
(S2 θ ∩ Z i )πi form a splitting of n i given by µbi where bi ≡ b (mod n i ).
(d) Let S1,i and S2,i form a splitting of n i given by µbi for i = 1 and 2. Prove that ((S1,1 ×
Zn 2 ) ∪ ({0} × S1,2 ))θ −1 and ((S2,1 × Zn 1 ) ∪ ({0} × S2,2 ))θ −1 form a splitting of n given
by µb , where b = (b1 , b2 )θ −1 .
We are now ready to give the criteria for the existence of duadic codes of length n over
Fq .
Theorem 6.3.2 Duadic codes of length n over Fq exist if and only if q is a square modulo n.
221
6.3 Existence of duadic codes
Proof: By Lemma 6.3.1, we may assume that n = p m , where p is an odd prime. We first
show that if a splitting of n = p m exists, then q is a square modulo n. Let U be the group
of units in Zn . By Lemma 6.2.1 this group is cyclic of order φ( p m ) = ( p − 1) p m−1 , which
is even as p is odd. Since q is relatively prime to n, q ∈ U . Let R be the subgroup of U
consisting of the squares in U . We only need to show that q ∈ R. If u generates U , then
u 2 generates R. As U has even order, R has index 2 in U . Since q ∈ U , define Q to be the
subgroup of U generated by q. Notice that if a ∈ U , then aq ∈ U and hence U is a union of
q-cyclotomic cosets modulo n; in fact, the q-cyclotomic cosets contained in U are precisely
the cosets of Q in U . The number of q-cyclotomic cosets in U is then the index |U : Q| of
Q in U . Let S1 and S2 form a splitting of n given by µb . Each q-cyclotomic coset of U is
in precisely one Si as U ⊆ S1 ∪ S2 and S1 ∩ S2 = ∅. Because b and n are relatively prime,
b ∈ U and so U µb = U implying that (U ∩ S1 )µb = U ∩ S2 . In particular, this says that
U has an even number of q-cyclotomic cosets. Thus |U : Q| is even; as |U : R| = 2 and U
is cyclic, Q ⊆ R. Thus q ∈ R as desired.
Now assume that q is a square modulo n. We show how to construct a splitting of n. For
1 ≤ t ≤ m, let Ut be the group of units in Z pt . Let Rt be the subgroup of Ut consisting of
the squares of elements in Ut and let Q t be the subgroup of Ut generated by q. As in the
previous paragraph, Ut is cyclic of even order, and Rt has index 2 in Ut . As q is a square
modulo p m , then q is a square modulo p t implying that Q t ⊆ Rt . Finally, Ut is a union of
q-cyclotomic cosets modulo p t and these are precisely the cosets of Q t in Ut . The nonzero
elements of Zn are the set
m
3
p m−t Ut .
(6.10)
t=1
We are now ready to construct the splitting of n. Since Ut is cyclic and Q t ⊆ Rt ⊆ Ut with
|Ut : Rt | = 2, there is a unique subgroup K t of Ut containing Q t such that |K t : Q t | = 2.
Note that Ut , Rt , Q t , and K t can be obtained from Um , Rm , Q m , and K m by reducing the
latter modulo p t . Let b ∈ K m \ Q m . Then K m = Q m ∪ bQ m , and hence, by reducing modulo p t , K t = Q t ∪ bQ t for 1 ≤ t ≤ m. Also, b2 ∈ Q t modulo p t . Let g1(t) , g2(t) , . . . , gi(t)
t
be distinct coset representatives of K t in Ut . Then the q-cyclotomic cosets modulo p t in
Q t , bg1(t) Q t , bg2(t) Q t , . . . , bgi(t)
Q t . Let
Ut are precisely the cosets g1(t) Q t , g2(t) Q t , . . . , gi(t)
t
t
(t)
(t)
(t)
Q
S1,t = g1 Q t ∪ g2 Q t ∪ · · · ∪ git Q t and S2,t = bg1(t) Q t ∪ bg2(t) Q t ∪ · · · ∪ bgi(t)
t . Then
t
2
t
S1,t µb = S2,t and S2,t µb = S1,t as b ∈ Q t modulo p , S1,t ∩ S2,t = ∅, and S1,t ∪ S2,t = Ut .
m−t
m
bg (t)
Note that p m−t g (t)
j Q t and p
j Q t are q-cyclotomic cosets modulo p . Thus by (6.10),
m
m
m−t
m−t
S1 = ∪t=1 p S1,t and S2 = ∪t=1 p S2,t form a splitting of n given by µb .
Exercise 345 Show that 2 is a square modulo 49. Use the technique in the proof to construct
a splitting of 49 over F2 .
We can now give necessary and sufficient conditions on the length n for the existence of
binary, ternary, and quaternary duadic codes. Recall from Theorem 6.1.3(iii) that n must
be odd.
Theorem 6.3.3 Let n = p1a1 p2a2 · · · prar where p1 , p2 , . . . , pr are distinct odd primes. The
following assertions hold:
222
Duadic codes
(i) Duadic codes of length n over F2 exist if and only if pi ≡ ±1 (mod 8) for 1 ≤ i ≤ r .
(ii) Duadic codes of length n over F3 exist if and only if pi ≡ ±1 (mod 12) for 1 ≤ i ≤ r .
(iii) Duadic codes of length n over F4 exist for all (odd ) n.
Proof: Duadic codes of length n over Fq exist if and only if q is a square modulo n by
Theorem 6.3.2. By Lemma 6.3.1, q is a square modulo n if and only if q is a square modulo
piai for 1 ≤ i ≤ r . By Lemma 6.2.3, q is a square modulo piai if and only if q is a square
modulo pi . Part (i) now follows from Lemma 6.2.5 and (ii) from Lemma 6.2.9. Finally, part
(iii) follows from the simple fact that 4 = 22 is always a square modulo n.
Exercise 346 Find the integers n, with 3 < n < 200, for which binary and ternary duadic
codes exist.
In the binary case, there are results that count the number of duadic codes of prime
length n = p. By Theorem 6.3.3, p ≡ ±1 (mod 8). Let e = ( p − 1)/(2ord p (2)). Then it
can be shown [67] that the number of duadic codes of length p depends only on e. Further,
the number of inequivalent duadic codes also depends only on e. If e is odd, the number of
pairs of odd-like (or even-like) binary duadic codes is 2e−1 . When e is even, there is a more
complicated bound for this number. For example, if p = 31, then ord p (2) = 5 and so e = 3.
There are four pairs of odd-like (or even-like) duadic codes. One pair is a pair of quadratic
residue codes; the other three pairs consist of six equivalent codes (see Example 6.4.8).
These facts hold for any length p ≡ ±1 (mod 8) for which e = 3, such as p = 223, 433,
439, 457, and 727.
6.4
Orthogonality of duadic codes
The multiplier µ−1 plays a special role in determining the duals of duadic codes just as it
does for duals of general cyclic codes; see Theorem 4.4.9. We first consider self-orthogonal
codes.
Theorem 6.4.1 Let C be any [n, (n − 1)/2] cyclic code over Fq . Then C is self-orthogonal
if and only if C is an even-like duadic code whose splitting is given by µ−1 .
Proof: Suppose C 1 = C is self-orthogonal with idempotent generator e1 (x). Let C 2 =
e2 (x), where e2 (x) = e1 (x)µ−1 . Since C 1 is self-orthogonal and j(x) is not orthogonal to
/ C 1 . Thus by Lemma 6.1.2(iii), C 1 is even-like, which implies that j(x) ∈ C ⊥
itself, j(x) ∈
1.
⊥
By Theorem 4.4.9, j(x) ∈ C ⊥
1 = 1 − e1 (x)µ−1 . As C 1 has dimension (n + 1)/2 and
⊥
C1 ⊂ C⊥
1 , we have C 1 = C 1 + j(x). Since e1 (x) j(x) = 0 by Lemma 6.1.2(i), it follows
from Theorem 4.3.7 that C ⊥
1 has generating idempotent e1 (x) + j(x) = 1 − e1 (x)µ−1 =
1 − e2 (x), giving (6.1). Since e2 (x) = e1 (x)µ−1 , e1 (x) = e2 (x)µ−1
−1 = e2 (x)µ−1 yielding
(6.3). Thus C 1 and C 2 are even-like duadic codes with splitting given by µ−1 .
Conversely, let C = C 1 = e1 (x) be an even-like duadic code with splitting given by
µ−1 . Then C 2 = e1 (x)µ−1 and C 1 ⊂ D1 = 1 − e1 (x)µ−1 = C ⊥
1 by Theorems 6.1.3(vi)
and 4.4.9 and (6.4).
223
6.4 Orthogonality of duadic codes
By Theorem 6.1.3(iii) and (v), the dimensions of even-like and odd-like duadic codes C i
and Di indicate it is possible (although not necessary) that the dual of C 1 is either D1 or
D2 . The next two theorems describe when these two possibilities occur.
Theorem 6.4.2 If C 1 and C 2 are a pair of even-like duadic codes over Fq , with D1 and D2
the associated pair of odd-like duadic codes, the following are equivalent:
(i) C ⊥
1 = D1 .
(ii) C ⊥
2 = D2 .
(iii) C 1 µ−1 = C 2 .
(iv) C 2 µ−1 = C 1 .
Proof: Since C 1 µa = C 2 , C 2 µa = C 1 , D1 µa = D2 , and D2 µa = D1 for some a (by definition of the even-like duadic codes and Theorem 6.1.3(vii)), (i) is equivalent to (ii)
by Exercise 35. Parts (iii) and (iv) are equivalent as µ−1
−1 = µ−1 . If (i) holds, C 1 is selforthogonal by Theorem 6.1.3(vi), implying that (iii) holds by Theorem 6.4.1. Conversely,
if (iii) holds, letting ei (x) be the generating idempotent for C i , we have e1 (x)µ−1 = e2 (x)
by Theorem 4.3.13. By Theorem 4.4.9, C ⊥
1 = 1 − e1 (x)µ−1 = 1 − e2 (x) = D 1 yielding
(i).
Theorem 6.4.3 If C 1 and C 2 are a pair of even-like duadic codes over Fq with D1 and D2
the associated pair of odd-like duadic codes, the following are equivalent:
(i) C ⊥
1 = D2 .
⊥
(ii) C 2 = D1 .
(iii) C 1 µ−1 = C 1 .
(iv) C 2 µ−1 = C 2 .
Proof: Since C 1 µa = C 2 , C 2 µa = C 1 , D1 µa = D2 , and D2 µa = D1 for some a (by definition of the even-like duadic codes and Theorem 6.1.3(vii)), (i) is equivalent to (ii) by
Exercise 35. Let ei (x) be the generating idempotent of C i . By Theorem 4.4.9, C ⊥
1 = D 2 if
⊥
and only if 1 − e1 (x)µ−1 = 1 − e1 (x) if and only if e1 (x)µ−1 = e1 (x). So C 1 = D2 if and
only if C 1 µ−1 = C 1 by Theorem 4.3.13. Thus (i) and (iii) are equivalent and, analogously,
(ii) and (iv) are equivalent. The theorem now follows.
Exercise 347 Identify the duadic codes and their duals in Exercises 328, 331, 332, and
335 and in Example 6.1.7.
Results analogous to those in the last three theorems hold with µ−2 in place of µ−1 for
codes over F4 where the orthogonality is with respect to the Hermitian inner product defined
in Section 1.3. These codes are of interest because if a code over F4 has only even weight
codewords, it must be Hermitian self-orthogonal by Theorem 1.4.10. The key to the analogy
is the result of Theorem 4.4.15 that if C is a cyclic code over F4 with generating idempotent
e(x), then C ⊥ H has generating idempotent 1 − e(x)µ−2 . We leave the proofs as an exercise.
Theorem 6.4.4 Let C be any [n, (n − 1)/2] cyclic code over F4 . Then C is Hermitian
self-orthogonal if and only if C is an even-like duadic code whose splitting is given by µ−2 .
Theorem 6.4.5 If C 1 and C 2 are a pair of even-like duadic codes over F4 with D1 and D2
the associated pair of odd-like duadic codes, the following are equivalent:
224
Duadic codes
(i)
(ii)
(iii)
(iv)
H
C⊥
1 = D1 .
⊥H
C 2 = D2 .
C 1 µ−2 = C 2 .
C 2 µ−2 = C 1 .
Theorem 6.4.6 If C 1 and C 2 are a pair of even-like duadic codes over F4 with D1 and D2
the associated pair of odd-like duadic codes, the following are equivalent:
H
(i) C ⊥
1 = D2 .
⊥H
(ii) C 2 = D1 .
(iii) C 1 µ−2 = C 1 .
(iv) C 2 µ−2 = C 2 .
Exercise 348 Identify the duadic codes over F4 and both their ordinary duals and Hermitian
duals in Examples 6.1.8 and 6.1.9 and in Exercise 334.
Exercise 349 Prove Theorems 6.4.4, 6.4.5, and 6.4.6.
In the binary case, µ−1 gives every splitting of duadic codes of prime length p ≡ −1
(mod 8), as the following result shows.
Theorem 6.4.7 If p is a prime with p ≡ −1 (mod 8), then every splitting of p over F2 is
given by µ−1 . Furthermore, every binary even-like duadic code of length p ≡ −1 (mod 8)
is self-orthogonal.
Proof: The second part of the theorem follows from the first part and Theorem 6.4.1. For
the first part, suppose that µa gives a splitting S1 and S2 for p. As µa interchanges S1 and S2 ,
µa , and hence a, cannot have odd (multiplicative) order. Suppose that a has order 2w. Then
a w is a solution of x 2 = 1 in Z p = F p , which has only the solutions 1 and −1. Since a w = 1
in Z p , a w ≡ −1 (mod p). If w = 2v, then −1 is the square of a v in Z p , a contradiction to
Lemma 6.2.4. So w is odd. Since µa swaps S1 and S2 , applying µa an odd number of times
swaps them as well. Hence (µa )w = µa w = µ−1 gives the same splitting as µa .
We use this result to find all the splittings of 31 over F2 leading to all the binary duadic
codes of length 31.
Example 6.4.8 There are seven 2-cyclotomic cosets modulo 31, namely: C0 = {0}, C1 =
{1, 2, 4, 8, 16}, C3 = {3, 6, 12, 17, 24}, C5 = {5, 9, 10, 18, 20}, C7 = {7, 14, 19, 25, 28},
C11 = {11, 13, 21, 22, 26}, C15 = {15, 23, 27, 29, 30}. By Theorem 6.4.7 all splittings S1
and S2 can be obtained using µ−1 as 31 ≡ −1 (mod 8). Since C1 µ−1 = C15 , one of C1 and
C15 is in S1 and the other is in S2 . By renumbering S1 and S2 if necessary, we may assume
that C1 is in S1 . Similarly as C3 µ−1 = C7 and C5 µ−1 = C11 , precisely one of C3 or C7 and
one of C5 and C11 is in S1 . Thus there are four possible splittings and hence four possible
pairs of even-like duadic codes (and four odd-like pairs as well). We give the splittings in
the following table.
225
6.4 Orthogonality of duadic codes
Splitting
S1
S2
1
2
3
4
C1 ∪ C5 ∪ C7
C1 ∪ C5 ∪ C3
C1 ∪ C11 ∪ C7
C1 ∪ C11 ∪ C3
C3 ∪ C11 ∪ C15
C7 ∪ C11 ∪ C15
C3 ∪ C5 ∪ C15
C7 ∪ C5 ∪ C15
By Theorems 6.1.10 and 6.1.11, we can use these splittings to determine generating
idempotents directly. The generating idempotents for the even-like duadic codes are ei (x) =
1 + j∈Si x j ; for the odd-like duadic codes they are ei (x) = j∈Si x j . The multiplier µ3
sends splitting number 2 to splitting number 3; it also sends splitting number 3 to splitting
number 4. Hence the corresponding generating idempotents are mapped by µ3 in the same
way, showing that the six even-like duadic codes (and six odd-like duadic codes) arising
from these splittings are equivalent by Theorem 4.3.13. These turn out to be the punctured
Reed–Muller codes R(2, 5)∗ ; see Section 1.10. All multipliers map splitting number 1 to
itself (although a multiplier may reverse S1 and S2 ). By Theorem 4.3.17, as 31 is a prime, a
duadic code arising from splitting number 1 is not equivalent to any duadic code arising from
splitting number 2, 3, or 4. The duadic codes arising from this first splitting are quadratic
residue codes. See Section 6.6.
Theorem 6.4.7 shows when all splittings over F2 of an odd prime p are given by µ−1 .
The following result for splittings of an odd prime p over F4 are analogous; the proof is in
[266].
Theorem 6.4.9 If p is an odd prime, then every splitting of p over F4 is given by:
(i) both µ−1 and µ−2 when p ≡ −1 (mod 8),
(ii) both µ−1 and µ2 when p ≡ 3 (mod 8), and
(iii) both µ−2 and µ2 when p ≡ −3 (mod 8).
Theorems 6.4.7 and 6.4.9 do not examine the case p ≡ 1 (mod 8). The next result
shows what happens there; see Lemma 6.4.16 and [266]. Part of this result is explained in
Exercise 351.
Theorem 6.4.10 Let p ≡ 1 (mod 8) be a prime. The following hold:
(i) If ord p (2) is odd, then some splittings for p over F2 , but not all, can be given by µ−1 ;
splittings given by µ−1 are precisely those given by µ−2 .
(ii) If ord p (2) is odd, then some splittings for p over F4 , but not all, can be given by µ−1 ;
splittings given by µ−1 are precisely those given by µ−2 .
(iii) If 2 | ord p (2) but 4 ∤ ord p (2), then some splittings for p over F4 , but not all, can be
given by µ−1 ; splittings given by µ−1 are precisely those given by µ2 .
(iv) If 4 | ord p (2), then some splittings for p over F4 , but not all, can be given by µ−2 ;
splittings given by µ−2 are precisely those given by µ2 .
Example 6.4.11 We look at various primes p ≡ 1 (mod 8) in light of Theorem 6.4.10. If
p = 17, ord p (2) = 8 and the theorem indicates that some, but not all, splittings over F4 are
given by both µ−2 and µ2 . If p = 41, ord p (2) = 20 and there is some splitting of 41 over
F4 given by both µ−2 and µ2 . If p = 57, ord p (2) = 18 and there is some splitting of 57 over
226
Duadic codes
F4 given by both µ−1 and µ2 . Finally, if p = 73, ord p (2) = 9 and there is some splitting
of 73 over F2 given by both µ−1 and µ−2 ; there is also some splitting of 73 over F4 given
by both µ−1 and µ−2 .
Exercise 350 Find the splittings alluded to in Example 6.4.11. Also, what are the other
splittings and what multipliers give them?
Exercise 351 This exercise explains part of Theorem 6.4.10. In Section 6.6 we will
discuss quadratic residue codes over Fq , which are merely duadic codes of odd prime
length p with splittings over Fq given by S1 = Q p and S2 = N p , where Q p are the
nonzero squares and N p the nonsquares in F p . Binary quadratic residue codes exist if
p ≡ ±1 (mod 8); quadratic residue codes over F4 exist for any odd prime p. Do the
following:
(a) Show that in F p the product of two squares or two nonsquares is a square.
(b) Show that in F p the product of a square and a nonsquare is a nonsquare.
(c) Show that if p ≡ 1 (mod 8), then −1, 2, and −2 are all squares in F p .
(d) Prove that if p ≡ 1 (mod 8), the splitting S1 = Q p and S2 = N p over Fq cannot be
given by either µ−1 or µ−2 .
We now consider extending odd-like duadic codes over Fq . Recall that if D is an odd of D is defined by D
= {
like code of length n, the ordinary extension D
c | c ∈ D} where
c = cc∞ = c0 · · · c p−1 c∞ and
p−1
c∞ = −
cj.
j=0
Since the odd-like duadic codes are [n, (n + 1)/2] codes by Theorem 6.1.3(v), the extended
codes would be [n + 1, (n + 1)/2] codes. Hence these codes could potentially be self-dual.
If D is an odd-like duadic code, D is obtained from its even-like subcode C by adding
j(x) to a basis of C. But j(x) is merely a multiple of the all-one vector 1, and so if we
want to extend D in such a way that its extension is either self-dual or dual to the other
extended odd-like duadic code, then we must extend 1 in such a way that it is orthogonal
to itself. The ordinary extension will not always yield this. However, the next result shows
that we can modify the method of extension for D1 and D2 if µ−1 gives the splitting and the
equation
1 + γ 2n = 0
(6.11)
has a solution γ in Fq . Let D be an odd-like duadic code and suppose (6.11) has a solution.
c = c0 c1 · · · cn−1 c∞ , where
If c = c0 c1 · · · cn−1 ∈ D, let
n−1
c∞ = −γ
ci .
i=0
= {
and D
The new extension of D is defined by D
c | c ∈ D}. Notice that the codes D
=D
D, where D is the (n + 1) × (n + 1) diagonal
are monomially equivalent because D
matrix D = diag(1, 1, . . . , 1, γ ).
227
6.4 Orthogonality of duadic codes
Theorem 6.4.12 Let D1 and D2 be a pair of odd-like duadic codes of length n over Fq .
Assume that there is a solution γ ∈ Fq to (6.11). The following hold:
1 and D
2 are self-dual.
(i) If µ−1 gives the splitting for D1 and D2 , then D
(ii) If D1 µ−1 = D1 , then D1 and D2 are duals of each other.
Proof: Let C 1 and C 2 be the pair of even-like duadic codes associated to D1 and D2 . We
have the following two observations:
(a) j(x) is orthogonal to itself.
C.
(b) j(x) is orthogonal to
i
The first holds because as a vector j(x) is (1/n, 1/n, . . . , 1/n, − γ ) ∈ Fqn+1 . By the choice
of γ , j(x) is orthogonal to itself. The second holds because C i is even-like and the extended
coordinate in
C i is always 0.
Assume that µ−1 gives the splitting of D1 and D2 . Then C 1 µ−1 = C 2 . By Theorem 6.4.2,
C 1 = D⊥
1 ; also C 1 is self-orthogonal by Theorem 6.4.1. As the extension of C 1 is trivial, C 1
is self-orthogonal. From (a) and (b) the code spanned by C 1 and j(x) is self-orthogonal. By
1 and so is self-dual. Analogously,
Theorem 6.1.3(v) and (ix), this self-orthogonal code is D
2 is self-dual proving (i).
D
Now assume that D1 µ−1 = D1 . Then C 1 µ−1 = C 1 . By Theorem 6.4.3, C 2 = D⊥
1 and
C 1 are orthogonal to each other by Theorem 6.1.3(vi). From (a) and (b)
hence
C 2 and
C 2 and j(x) are orthogonal to each other. By
the codes spanned by
C 1 and j(x) and by
2 of dimension (n + 1)/2; hence they
1 and D
Theorem 6.1.3(v) and (ix), these codes are D
are duals of each other.
Before proceeding, we make a few remarks.
r A generator matrix for D
i is obtained by taking any generator matrix for C i , adjoining a
column of zeros and adding the row 1, 1, . . . , 1, − γ n representing nj(x).
r In general γ satisfying (6.11) exists in F if and only if n and −1 are both squares or
q
both nonsquares in Fq .
r If q is a square, then F contains γ such that (6.11) holds, as we now see. If q = r 2s ,
q
where r is a prime, then every polynomial of degree 2 with entries in Fr has roots in Fr 2 ,
which is a subfield of Fq by Theorem 3.5.3. Since γ is a root of the quadratic nx 2 + 1,
Fr 2 ⊆ Fq contains γ .
r In (6.11), γ = 1 if n ≡ −1 (mod r ) where F has characteristic r . For such cases, D
i =
q
i
Di . In particular, this is the case if Fq has characteristic 2. More particularly, Di = D
if the codes are either over F2 or F4 .
r There are values of n and q where duadic codes exist but γ does not. For example,
if q = 3, duadic codes exist only if n ≡ ±1 (mod 12) by Theorem 6.3.3. Suppose that
n ≡ 1 (mod 12). If γ ∈ F3 satisfies (6.11), then 1 + γ 2 = 0 in F3 , which is not possible.
If n ≡ −1 (mod 12) and if γ ∈ F3 satisfies (6.11), then 1 − γ 2 = 0 in F3 , which has
i is the
solution γ = ±1. Thus if n ≡ −1 (mod 12), we may choose γ = 1, and then D
i .
ordinary extension D
Example 6.4.13 The 5-cyclotomic cosets modulo 11 are C0 = {0}, C1 = {1, 3, 4, 5, 9},
and C2 = {2, 6, 7, 8, 10}. There is one splitting S1 = C1 and S2 = C2 of 11 over F5 ; it is
228
Duadic codes
given by µ−1 . Let i 1 (x) = x + x 3 + x 4 + x 5 + x 9 and i 2 (x) = x 2 + x 6 + x 7 + x 8 + x 10 .
The generating idempotent of E 11 over F5 is −i 1 (x) − i 2 (x). The even-like duadic codes C 1
and C 2 of length 11 over F5 have generating idempotents e1 (x) = i 1 (x) − 2i 2 (x) and e2 (x) =
−2i 1 (x) + i 2 (x). The odd-like duadic codes D1 and D2 have generating idempotents 1 +
i = D
i ;
2i 1 (x) − i 2 (x) and 1 − i 1 (x) + 2i 2 (x). The solutions of (6.11) are γ = ±2. So D
however Di is self-dual by Theorem 6.4.12(i).
Exercise 352 Verify the claims made in Example 6.4.13.
In case there is no solution to (6.11) in Fq , as long as we are willing to rescale the
last coordinate differently for D1 and D2 , we can obtain dual codes if D1 µ−1 = D1 . It
1 and D
2 D ′ are duals where
is left as an exercise to show that if D1 µ−1 = D1 , then D
′
D = diag(1, 1, . . . , 1, −1/n).
Exercise 353 Let D1 and D2 be odd-like duadic codes over Fq of length n with Di µ−1 =
1 and D
2 D ′ are dual codes where D ′ = diag(1, 1, . . . , 1, −1/n).
Di . Show that D
In a manner similar to Theorem 6.4.12, the following theorem explains the duality of the
extended odd-like duadic codes over F4 under the Hermitian inner product. Recall that in
i .
i = D
this case that γ = 1 is a solution of (6.11) and D
Theorem 6.4.14 Let D1 and D2 be a pair of odd-like duadic codes of length n over F4 .
The following hold:
1 and D
2 are Hermitian self-dual.
(i) If µ−2 gives the splitting for D1 and D2 , then D
(ii) If D1 µ−2 = D1 , then D1 and D2 are Hermitian duals of each other.
Exercise 354 Prove Theorem 6.4.14.
We can characterize the lengths for which there exist self-dual extended cyclic binary
codes. To do that, we need the following two lemmas.
Lemma 6.4.15 For an odd prime p and a positive integer b, −1 is a power of 2 modulo
p b if and only if ord p (2) is even.
Proof: Suppose that ord p (2) is even. If w = ord pb (2), as 2w ≡ 1 (mod p b ) implies that
2w ≡ 1 (mod p), w cannot be odd. So w = 2r and (2r )2 ≡ 1 (mod p b ). Thus 2r ≡ 1
(mod p b ) is a solution of x 2 ≡ 1 (mod p b ). By Exercise 355, 2r ≡ −1 (mod p b ) and −1 is a
power of 2 modulo p b . Conversely, suppose that −1 is a power of 2 modulo p b . Then −1 is a
power of 2 modulo p and so ord p (−1) | ord p (2); as ord p (−1) = 2, ord p (2) is even.
Exercise 355 Let p be an odd prime and b a positive integer. Prove that the only solutions
of x 2 ≡ 1 (mod p b ) are x ≡ ±1 (mod p b ).
Lemma 6.4.16 Let p be an odd prime and a be a positive integer. A splitting of pa over F2
given by the multiplier µ−1 exists if and only if ord p (2) is odd.
Proof: Let Ci be the 2-cyclotomic coset modulo pa containing i. Suppose that a splitting S1 and S2 of pa over F2 given by µ−1 exists. Then C1 and C1 µ−1 are not in the
229
6.5 Weights in duadic codes
same Si . Hence −1 ∈ C1 and so −1 is not a power of 2. By Lemma 6.4.15, ord p (2) is
odd.
Conversely, suppose ord p (2) is odd. Suppose Ci µ−1 = Ci for some i ≡ 0 (mod pa ).
Then i2 j ≡ −i (mod pa ) for some j. Hence 2 j ≡ −1 (mod p b ) for some 1 ≤ b ≤ a. But
this contradicts Lemma 6.4.15. Therefore, Ci µ−1 = Ci for i ≡ 0 (mod pa ). One splitting
of pa is given by placing exactly one of Ci or Ci µ−1 in S1 and the other in S2 . This is
possible as Ci and Ci µ−1 are distinct if i ≡ 0 (mod pa ).
Exercise 356 Find all splittings of p = 89 over F2 given by µ−1 . Find one splitting not
given by µ−1 .
Before continuing, we remark that Theorem 6.4.10(i) is a special case of Lemma 6.4.16.
Theorem 6.4.17 Self-dual extended cyclic binary codes of length n + 1 exist if and only if
n = p1a1 p2a2 · · · prar , where p1 , . . . , pr are distinct primes such that for each i either:
(i) pi ≡ −1 (mod 8), or
(ii) pi ≡ 1 (mod 8) and ord pi (2) is odd.
Proof: We first show that n has a splitting given by µ−1 if and only if (i) or (ii) hold. By
Lemma 6.3.1, this is equivalent to showing that piai has a splitting given by µ−1 if and
only if pi satisfies either (i) or (ii). By Theorem 6.3.3 and Corollary 6.1.6, the only primes
that can occur in any splitting of piai satisfy pi ≡ ±1 (mod 8). If pi ≡ −1 (mod 8), then
ord pi (2) is odd by Lemma 6.2.6. So by Lemma 6.4.16, n has a splitting over F2 given by
µ−1 if and only if (i) or (ii) hold.
Suppose that n has a splitting over F2 given by µ−1 . Let D1 and D2 be odd-like duadic
i in the binary case, D
i are self-dual by Theoi = D
codes given by this splitting. As D
of
rem 6.4.12. Conversely, suppose that there is a self-dual extended cyclic binary code D
length n + 1, where D is an [n, (n + 1)/2] cyclic code. Let C be the [n, (n − 1)/2] even-like
is self-dual, C is self-orthogonal. Then µ−1 gives a splitting of n over
subcode of D; as D
F2 by Theorem 6.4.1.
Exercise 357 Find all lengths n with 2 < n < 200 where there exist self-dual extended
cyclic binary codes of length n + 1.
6.5
Weights in duadic codes
In this section, we present two results about the possible codeword weights in duadic codes.
The first deals with weights of codewords in binary duadic codes when the splitting is
given by µ−1 . The second deals with the minimum weight codewords in duadic codes over
arbitrary fields. We conclude the section with data about binary duadic codes of lengths up
to 241, including minimum weights of these codes.
Theorem 6.5.1 Let D1 and D2 be odd-like binary duadic codes of length n with splitting
given by µ−1 . Then for i = 1 and 2,
(i) the weight of every even weight codeword of Di is 0 mod 4, and the weight of every
odd weight codeword of Di is n mod 4, and moreover
230
Duadic codes
i is self-dual doubly-even if n ≡ −1 (mod 8) and D
i is self-dual singly-even if n ≡ 1
(ii) D
(mod 8).
Proof: Let C 1 and C 2 be the associated even-like duadic codes. By Theorem 6.4.2 and Theorem 6.1.3(vi), C i is self-orthogonal and is the even-like subcode of Di . By Theorem 4.4.18, C i
is doubly-even and so every even weight codeword of Di has weight congruent to 0 mod 4.
By Theorem 6.1.3(ix), the odd weight codewords of Di are precisely j(x) + c(x), where
c(x) ∈ C i . As j(x) is the all-1 codeword, wt( j(x) + c(x)) ≡ n − wt(c(x)) ≡ n (mod 4).
Thus (i) holds.
By Theorem 6.3.3, n ≡ ±1 (mod 8). As µ−1 gives the splitting, Theorem 6.4.12(i) shows
i is self-dual. By part (i), the codewords of D
i that are extensions of even weight
i = D
that D
codewords have weights congruent to 0 mod 4; those that are extensions of odd weight
codewords have weights congruent to (n + 1) mod 4. Part (ii) now follows.
In the next theorem, we present a lower bound on the minimum odd-like weight in oddlike duadic codes, called the Square Root Bound. If this bound is actually met and the
splitting is given by µ−1 , an amazing combinatorial structure arises involving the supports
of the minimum weight codewords. This is a precursor of the material in Chapter 8, where
we will investigate other situations for which the set of supports of codewords of a fixed
weight form interesting combinatorial structures. Recall that in Chapter 3, we defined the
support of a vector v to be the set of coordinates where v is nonzero. In our next result, the
combinatorial structure that arises is called a projective plane. A projective plane consists
of a set P of points together with a set L of lines for which the following conditions are
satisfied. A line ℓ ∈ L is a subset of points. For any two distinct points, there is exactly one
line containing these two points (that is, two distinct points determine a unique line that
passes though these two points). Any two distinct lines have exactly one point in common.
Finally, to prevent degeneracy, we must also have at least four points no three of which
are on the same line. From this definition, it can be shown (see Theorem 8.6.2) that if P
is finite, then every line has the same number µ + 1 of points, every point lies on µ + 1
lines, and there are µ2 + µ + 1 points and µ2 + µ + 1 lines. The number µ is called the
order of the projective plane. Any permutation of the points which maps lines to lines is an
automorphism of the projective plane. The projective plane is cyclic provided the plane has
an automorphism that is a (µ2 + µ + 1)-cycle. The projective plane we will obtain arises
from an odd-like duadic code of length n as follows. The set of points in the projective
plane is the set of coordinates of the code. The set of lines in the plane is the set of supports
of all the codewords of minimum weight. Thus as our code is cyclic, the resulting supports
will form a projective plane that is also cyclic. Exercise 358 illustrates this result.
Exercise 358 In Example 6.1.4, we constructed the (only) pair of odd-like binary duadic
codes of length 7. Pick one of them and write down the supports of the weight 3 codewords.
Show that these supports form a cyclic projective plane of order 2.
Theorem 6.5.2 (Square Root Bound) Let D1 and D2 be a pair of odd-like duadic codes
of length n over Fq . Let do be their (common) minimum odd-like weight. Then the following
hold:
231
6.5 Weights in duadic codes
(i) do2 ≥ n.
(ii) If the splitting defining the duadic codes is given by µ−1 , then do2 − do + 1 ≥ n.
(iii) Suppose do2 − do + 1 = n, where do > 2, and assume the splitting is given by µ−1 .
Then for i = 1 and 2:
(a) do is the minimum weight of Di ,
(b) the supports of the minimum weight codewords of Di form a cyclic projective plane
of order do − 1,
(c) the minimum weight codewords of Di are multiples of binary vectors, and
(d) there are exactly n(q − 1) minimum weight codewords in Di .
Proof: Suppose the splitting defining the duadic codes is given by µa . Let c(x) ∈ D1 be an
odd-like vector of weight do . Then c′ (x) = c(x)µa ∈ D2 is also odd-like and c(x)c′ (x) ∈
D1 ∩ D2 as D1 and D2 are ideals in Rn . But D1 ∩ D2 = j(x) by Theorem 6.1.3(viii). By
Lemma 6.1.2(ii), c(x)c′ (x) is odd-like, and in particular nonzero. Therefore c(x)c′ (x) is a
nonzero multiple of j(x), and so wt(c(x)c′ (x)) = n. The number of terms in the product
c(x)c′ (x) is at most do2 , implying (i). If µa = µ−1 , then the number of terms in c(x)c′ (x)
j
0
′
is at most do2 − do + 1 because, if c(x) = n−1
j=0 c j x , the coefficient of x in c(x)c (x) is
2
j c j , where the sum is over all subscripts with c j = 0. This produces (ii).
We prove (iii) for i = 1. Suppose the splitting is given by µ−1 and n = do2 − do + 1. Let
c(x) = c0 + c1 x + · · · + cn−1 x n−1 ∈ D1 be an odd-like vector of weight do , and let c′ (x) =
c(x)µ−1 . As in the proof of part (i), c(x)c′ (x) ∈ D1 ∩ D2 = j(x), implying c(x)c′ (x) =
δ j(x) for some δ = 0. Let ci1 , . . . , cido be the nonzero coefficients of c(x). Let M be the
do × n matrix whose jth row corresponds to ci j x −i j c(x) obtained by multiplying c(x) by the
jth nonzero term of c′ (x) = c(x −1 ). Adding the rows of M gives a vector corresponding
to c(x)c′ (x) = δ j(x). Then M has do2 nonzero entries. We label the columns of M by
x 0 , x 1 , x 2 , . . . , x n−1 . Column x 0 of M has do nonzero entries. This leaves do2 − do = n − 1
nonzero entries for the remaining n − 1 columns. Since the sum of the entries in the column
labeled x i is the coefficient of x i in δ j(x), which is nonzero, each of the last n − 1 columns
contains exactly one nonzero entry. Thus the supports of any two rows of M overlap in
only the x 0 coordinate.
Suppose m(x) = m 0 + m 1 x + · · · + m n−1 x n−1 ∈ D1 is a nonzero even-like vector of
weight w. We show that w ≥ do + 1. By shifting m(x), we may assume that m 0 = 0. The
supports of m(x) and each row of M overlap in coordinate x 0 . If C 1 and C 2 are the even-like
duadic codes corresponding to D1 and D2 , then C 1 is self-orthogonal and C ⊥
1 = D 1 by
=
D
,
m(x)
is
orthogonal
Theorem 6.4.2. As m(x) ∈ C 1 by Theorem 6.1.3(vi) and C ⊥
to
1
1
every row of M. Therefore the support of m(x) and the support of each row of M overlap
in at least one more coordinate. As columns x 1 , . . . , x n−1 of M each have exactly one
nonzero entry, wt(m(x)) = w ≥ do + 1. This proves (iii)(a) and shows that the codewords
in D1 of weight do must be odd-like.
Let P = {x 0 , x 1 , . . . , x n−1 } be called points; let L, which we will call lines, be the distinct
supports of all codewords in D1 of weight do , with supports considered as subsets of P.
Let ℓc and ℓm be distinct lines associated with c(x), m(x) ∈ D1 , respectively. As c(x) and
m(x) have weight do , they are odd-like. Hence by Theorem 6.1.3(ix), c(x) = α j(x) + a(x)
and m(x) = β j(x) + b(x), where a(x), b(x) ∈ C 1 and α, β are nonzero elements of Fq . As
232
Duadic codes
C⊥
1 = D 1 , the inner product of c(x) and m(x) is a nonzero multiple of the inner product of
j(x) with itself, which is nonzero. Thus any two lines of L intersect in at least one point.
The size of the intersection ℓc ∩ ℓm is not changed if both ℓc and ℓm are shifted by the
same amount. Hence by shifting c(x) and m(x) by x i for some i, we may assume x 0 ∈ ℓc
and x 0 ∈
/ ℓm as ℓc = ℓm . We construct the do × n matrix M from c(x) as above. If ℓ is
the support of any row of M, then the size of ℓ ∩ ℓm is at least one, since we just showed
that lines of L intersect in at least one point. Recall that each column of M except the
first has exactly one nonzero entry. As ℓm intersects each of the rows of M in at least one
coordinate and that cannot be the first coordinate, the size of ℓ ∩ ℓm must be exactly one
because wt(m(x)) = do and M has do rows. One of the rows of M has support ℓc , implying
that ℓc ∩ ℓm has size one for all distinct lines. In particular, two points determine at most
one line and every pair of lines intersects in exactly one point. To complete (iii)(b), we only
need to show that two points determine at least one line. The set L is invariant under cyclic
shifts as D1 is cyclic, and so we only need to show that x 0 and x i , for i > 0, determine at
least one line. But that is clear, as all lines corresponding to rows of M have a nonzero
entry in coordinate x 0 , and some row has a nonzero entry in coordinate x i . It follows that
L is a cyclic projective plane of order do − 1 giving (iii)(b).
Let c(x) = c0 + c1 x + · · · + cn−1 x n−1 ∈ D1 have weight do . Defining M as above, if
r > s and cr , cs are nonzero, the unique nonzero entry of column x r −s of M is cr cs . The
rows of M sum to a vector corresponding to c(x)c(x −1 ) = δ j(x), where δ = 0. Thus each
column of M sums to (1/n)δ. So if i < j < k and ci , c j , ck are nonzero (three such values
exist as do > 2), ci c j = ci ck = c j ck = (1/n)δ implying that ci = c j = ck . Hence (iii)(c)
holds. There are
4
n
do
=n
2
2
lines in L; since every minimum weight codeword is a multiple of the binary vector corresponding to a line, (iii)(d) follows.
In part (iii) of the previous theorem, the restriction that do > 2 is not significant because
if do ≤ 2 and n = do2 − do + 1, then n ≤ 3.
Exercise 359 In this exercise, you will construct certain duadic codes over F3 of length
13. The 3-cylotomic cosets modulo 13 are:
C0 = {0}, C1 = {1, 3, 9}, C2 = {2, 5, 6}, C4 = {4, 10, 12}, C7 = {7, 8, 11}.
Let ci (x) = j∈Ci x j .
(a) Show that c1 (x)2 = c2 (x) − c4 (x). Then use µ2 to compute c2 (x)2 , c4 (x)2 , and c7 (x)2 .
(b) Show that c1 (x)c2 (x) = c1 (x) + c2 (x) + c7 (x) and c1 (x)c4 (x) = c2 (x) + c7 (x). Then
use µ2 to compute ci (x)c j (x) for i < j and i, j ∈ {1, 2, 4, 7}.
(c) Construct generating idempotents for the two pairs of even-like duadic codes of length
13 over F3 whose splitting is given by µ−1 . Hint: The generating idempotents each have
weight 9. If e(x) is such an idempotent, e(x) + e(x)µ−1 = 1 − j(x).
(d) Construct the generating idempotents for the two pairs of odd-like duadic codes of
length 13 over F3 whose splitting is given by µ−1 .
233
6.5 Weights in duadic codes
Table 6.1 Binary duadic codes of length n ≤ 119
n
7
17
23
31
31
41
47
49
71
73
73
73
73
79
89
89
89
89
97
103
113
113
119
119
119
119
Idempotent
d
do
a
1∗
0, 1∗
1∗
1, 5, 7∗
1, 3, 5
0, 1∗
1∗
0, 1, 7
1∗
0, 1, 3, 5, 11
0, 1, 3, 5, 13
0, 1, 5, 9, 17
0, 1, 3, 9, 25∗
1∗
0, 1, 3, 5, 13
0, 1, 3, 5, 19
0, 1, 3, 11, 33
0, 1, 5, 9, 11∗
0, 1∗
1∗
0, 1, 9∗
0, 1, 3
0, 1, 13, 17, 21
0, 1, 7, 11, 51
0, 1, 7, 13, 17
0, 1, 7, 11, 17
3
5
7
7
7
9
11
4
11
9
9
12
13
15
12
12
15
17
15
19
15
18
4
6
8
12
3
5
7
7
7
9
11
9
11
9
9
13
13
15
17
17
15
17
15
19
15
19
15
15
15
15
−1‡
3⋊
⋉
−1‡
−1‡
−1‡
3⋊
⋉
−1‡
−1†
−1‡
−1†
−1†
3⋊
⋉
5⋊
⋉
−1‡
−1†
−1†
5⋊
⋉
3⋊
⋉
5⋊
⋉
−1‡
3⋊
⋉
9⋊
⋉
3
3
3
3
Number
of codes
2
2
2
2
6
2
2
4
2
8
8
4
2
2
8
8
4
2
2
2
2
4
4
4
4
4
(e) Show that 1 + ci (x) for i ∈ {1, 2, 4, 7} are each in some code in part (d). Hint: Do this
for i = 1 and then use µ2 .
(f) Write down all 13 cyclic shifts of 1 + c1 (x). What is the resulting structure?
Table 6.1 gives information about the odd-like binary duadic codes D of length n ≤ 119.
The splittings given in this table are obtained in a manner similar to that of Example 6.4.8.
The idempotent of D is of the form s∈I i∈Cs x i . The column “Idempotent” gives the
index set I for each equivalence class of odd-like duadic codes; a ∗ in this column means
that the code is a quadratic residue code. The columns “d” and “do ” are the minimum
weight and minimum odd weight, respectively. The column “a” indicates the multiplier µa
giving the splitting. A ⋊
⋉ in column “a” indicates that the two extended odd-like duadic
codes are duals of each other (i.e. µ−1 fixes each code by Theorem 6.4.12). If the splitting is
given by µ−1 , then by Theorem 6.5.1, the extended code is self-dual doubly-even if n ≡ −1
(mod 8) and self-dual singly-even if n ≡ 1 (mod 8). We denote the codes whose extensions
234
Duadic codes
are self-dual singly-even by † and the codes whose extensions are self-dual doubly-even
by ‡ in the column “a”. Finally, the column “Number of codes” is the number of either
even-like or odd-like duadic codes in an equivalence class of duadic codes constructed from
the given splitting. (For example, a 6 in the “Number of codes” column means that there
are six equivalent even-like codes and six equivalent odd-like codes in the class of codes
represented by the given splitting.) Theorem 4.3.17 shows that equivalence is determined
by multipliers except in the case n = 49; however, in that case using multipliers is enough
to show that there is only one class. The table comes from [277]. (The values of do and d
for the [113, 57, 18] code in the table were computed probabilistically in [277] and exactly
in [315]. This code has higher minimum weight than the quadratic residue code of the same
length.) The minimum weight codewords in the [7, 4, 3] and [73, 37, 9] codes support
projective planes of orders 2 and 8, respectively; see Exercise 358. There are 16 duadic
codes of length 73 containing projective planes of order 8; while the codes fall into two
equivalence classes of eight codes each, the projective planes are all equivalent. The codes
of length 31, which are not quadratic residue codes, are punctured Reed–Muller codes; see
Example 6.4.8.
Information about the binary duadic odd-like codes of lengths 127 ≤ n ≤ 241 has been
computed in [277]. We summarize some of this information in Table 6.2. In this table we
list in the column labeled “d” the minimum weights that occur. The minimum weights reported here were computed probabilistically. Some of these were computed exactly in [315];
wherever exact minimum weights were found, they agreed with those computed in [277].
See [189, 277]. There may be several codes at each length. For example, if n = 217, there
are 1024 duadic codes that fall into 88 equivalence classes by Theorem 4.3.17. Further information on the idempotents and splittings can be found in [277]. In the column “Comments”,
we indicate if the extended codes are self-dual singly-even by † or self-dual doubly-even
by ‡, or if the extended codes are duals of each other by ⋊
⋉. By examining Tables 6.1 and
6.2, the Square Root Bound is seen to be very useful for duadic codes of relatively small
length but becomes weaker as the length gets longer.
Table 6.3, taken from [266], with certain values supplied by Philippe Gaborit, gives information about the odd-like duadic codes over F4 of odd lengths 3 through 41. No prime
lengths p ≡ −1 (mod 8) are listed as these duadic codes have the same generating idempotents as those given in Table 6.1; because these codes have a binary generator matrix,
their minimum weight is the same as the minimum weight of the corresponding binary
code by Theorem 3.8.8. For the same reasons we do not list those codes of prime length
p ≡ 1 (mod 8) which have binary idempotents (also described in Table 6.1). The codes of
length p ≡ −1 (mod 8) have extensions that are self-dual under both the ordinary and the
Hermitian inner products; see Theorems 6.4.9 and 6.4.12 and Exercise 360. In Table 6.3,
the column labeled “n” gives the length, “d” the minimum weight, and “
d” the minimum
weight of the extended code. Each row of the table represents a family of equivalent oddlike duadic codes that are permutation equivalent. The column “Number of codes” is the
number of odd-like (or even-like) duadic codes in the equivalence class. In the “Comments” column, the symbol ♠ means that the splitting is given by µ−1 and so the extended
codes are self-dual with respect to the ordinary inner product. The symbol ♥ means that
the splitting is given by µ−2 and so the extended codes are self-dual with respect to the
235
6.5 Weights in duadic codes
Table 6.2 Binary duadic codes of length 119 < n ≤ 241
n
d
Comments
127
127
127
137
151
151
161
161
161
167
191
193
199
217
217
217
217
217
217
223
233
233
233
239
241
241
241
15
16
19
21
19
23
4
8
16
23
27
27
31
4
8
12
16
20
24
31
25
29
32
31
25
30
31
‡ includes punctured Reed–Muller codes
‡
‡ includes quadratic residue codes
⋊
⋉quadratic residue codes only
‡ includes quadratic residue codes
‡
†
†
†
‡ quadratic residue codes only
‡ quadratic residue codes only
⋊
⋉quadratic residue codes only
‡ quadratic residue codes only
†
†
†
†
†
†
‡ includes quadratic residue codes
⋊
⋉quadratic residue codes only
†
⋊
⋉
‡ quadratic residue codes only
⋊
⋉
⋊
⋉
⋊
⋉quadratic residue codes only
Hermitian inner product; see Theorem 6.4.14. At length 21, one of the splittings given by
µ−1 yields odd-like duadic codes whose weight 5 codewords support a projective plane of
order 4.
Exercise 360 Prove that if C is an [n, n/2] code over F4 with a generator matrix consisting
of binary vectors, which is self-dual under the ordinary inner product, then C is also self-dual
under the Hermitian inner product.
It has been a long-standing open problem to find a better bound than the Square Root
Bound, since this seems to be a very weak bound. Additionally, it is not known whether
the family of duadic codes over Fq is asymptotically good or bad. (See Section 5.1 for the
definition of asymptotically good.) Also there is no efficient decoding scheme known for
duadic codes. Finding such a scheme would enhance their usefulness greatly. We pose these
questions as research problems.
236
Duadic codes
Table 6.3 Duadic codes over F4 of length n ≤ 41
(excluding those with binary generating idempotents)
n
d
3
5
9
11
13
15
15
15
15
17
19
21
21
21
21
25
27
27
29
33
33
33
33
35
35
35
37
39
39
39
39
41
2
3
3
5
5
6
6
4
3
7
7
4
5
6
3
3
3
3
11
3
10
6
6
8
7
4
11
6
3
3
10
11
d
3
4
3
6
6
7
6
4
3
8
8
4
6
6
3
4
3
3
12
3
10
6
6
8
8
4
12
6
3
3
11
12
Number
of codes
2
2
4
2
2
4
4
4
4
4
2
4
4
4
4
4
4
4
2
4
4
4
4
8
4
4
2
4
4
4
4
4
Comments
♠ quadratic residue
♥ quadratic residue
♠
♠ quadratic residue
♥ quadratic residue
♥
♠ quadratic residue
♠
♠ projective plane
♠
♠
♥
♠
♠
♥ quadratic residue
♠
♠
♠
♠
♥
♥
♥
♥ quadratic residue
♥
Research Problem 6.5.3 Improve the Square Root Bound for either the entire family of
duadic codes or the subfamily of quadratic residue codes.
Research Problem 6.5.4 Decide whether or not the family of duadic codes is asymptotically good or bad.
Research Problem 6.5.5 Find an efficient decoding scheme for either the entire family of
duadic codes or the subfamily of quadratic residue codes.
237
6.6 Quadratic residue codes
6.6
Quadratic residue codes
In this section we study more closely the family of quadratic residue codes, which, as we
have seen, are special cases of duadic codes. These codes, or extensions of them, include
the Golay codes and the hexacode. Quadratic residue codes are duadic codes over Fq of
odd prime length n = p; by Theorem 6.3.2, q must be a square modulo n. Throughout this
section, we will let n = p be an odd prime not dividing q; we will assume that q is a prime
power that is a square modulo p. Let Q p denote the set of nonzero squares modulo p, and
let N p be the set of nonsquares modulo p. The sets Q p and N p are called the nonzero
quadratic residues and the quadratic nonresidues modulo p, respectively.
We begin with the following elementary lemma.
Lemma 6.6.1 Let p be an odd prime. The following hold:
(i) |Q p | = |N p | = ( p − 1)/2.
(ii) Modulo p, we have Q p a = Q p , N p a = N p , Q p b = N p , and N p b = Q p when a ∈
Q p and b ∈ N p .
Proof: The nonzero elements of the field F p form a cyclic group F∗p of even order p − 1 with
generator α. Q p is the set of even order elements, that is, Q p = {α 2i | 0 ≤ i < ( p − 1)/2};
this set forms a subgroup of index 2 in F∗p . Furthermore N p is the coset Q p α. The results
now follow easily.
This lemma implies that the product of two residues or two nonresidues is a residue, while
the product of a residue and a nonresidue is a nonresidue; these are facts we use throughout
this section (see also Exercise 351). As a consequence of this lemma and Exercise 361, the
pair of sets Q p and N p is a splitting of p given by the multiplier µb for any b ∈ N p . This
splitting determines the defining sets for a pair of even-like duadic codes and a pair of
odd-like duadic codes, called the quadratic residue codes or QR codes, of length p over Fq .
The odd-like QR codes have defining sets Q p and N p and dimension ( p + 1)/2, while the
even-like QR codes have defining sets Q p ∪ {0} and N p ∪ {0} and dimension ( p − 1)/2.
This discussion proves the following theorem.
Theorem 6.6.2 Quadratic residue codes of odd prime length p exist over Fq if and only if
q ∈ Qp.
Exercise 361 Prove that if p is an odd prime, Q p and N p are each unions of q-cyclotomic
cosets if and only if q ∈ Q p .
In the next two subsections, we will present the generating idempotents for the QR codes
over fields of characteristic 2 and 3 as described in [274]. The following two theorems will
assist us with this classification. The first theorem provides, among other things, a form that
an idempotent must have if it is the generating idempotent for a quadratic residue code. In
that theorem, we have to distinguish between QR codes and trivial codes. The trivial codes
of length p over Fq are: 0, Fqp , the even-like subcode E p of Fqp , and the code 1 generated by
the all-one codeword. The second theorem will give the generating idempotents of the four
238
Duadic codes
QR codes from one of the even-like generators and describe how the generating idempotent
of a QR code over some field is related to the generating idempotent of a QR code over an
extension field.
Theorem 6.6.3 Let C be a cyclic code of odd prime length p over Fq , where q is a square
modulo p. Let e(x) be the generating idempotent of C. The following hold:
(i) C is a quadratic residue code or one of the trivial codes if and only if e(x)µc = e(x)
for all c ∈ Q p .
(ii) If C is a quadratic residue code with generating idempotent e(x), then
e(x) = a0 + a1
i∈Q p
x i + a2
xi ,
i∈N p
for some a0 , a1 , and a2 in Fq .
(iii) If c ∈ Q p and C is a quadratic residue code, then µc ∈ PAut(C).
Proof: The trivial codes 0, Fqp , E p , and 1 have defining sets {0} ∪ Q p ∪ N p , ∅, {0},
and Q p ∪ N p , respectively. Let T be the defining set of C. By Theorem 4.3.13 and
Corollary 4.4.5, the code Cµc is cyclic with generating idempotent e(x)µc and defining
set c−1 T mod p. So e(x)µc = e(x) if and only if cT ≡ T (mod p). Using Lemma 6.6.1,
e(x)µc = e(x) for all c ∈ Q p if and only if T is a union of some of {0}, Q p , or N p . So part
(i) follows. Part (ii) follows from (i) and Lemma 6.6.1 as e(x c ) = e(x) for all c ∈ Q p . Part
(iii) also follows from part (i) because C and Cµc have the same defining set for c ∈ Q p and
hence are equal, implying that µc ∈ PAut(C).
Theorem 6.6.4 Let C be an even-like quadratic residue code of prime length p over Fq
with idempotent e(x). The following hold:
(i) The four quadratic residue codes over Fq or any extension field of Fq have generating
idempotents e(x), e(x)µb , e(x) + j(x), and e(x)µb + j(x) for any b ∈ N p .
(ii) e(x) + e(x)µb = 1 − j(x) for b ∈ N p .
(iii) The four quadratic residue codes over Fq have the same minimum weight and the same
minimum weight codewords, up to scalar multiplication, as they do over an extension
field of Fq .
Proof: By (6.3) and Theorem 6.1.3, the generating idempotents for the four QR codes
over Fq are as claimed in (i). Because these four idempotents remain idempotents over
any extension field of Fq and are associated with the same splitting of p into residues and
nonresidues, they remain generating idempotents of QR codes over any extension field of
Fq , completing (i). Part (ii) follows from (6.1), while (iii) follows from Theorem 3.8.8.
Part (i) of this theorem was already illustrated in Example 6.1.9.
6.6.1
QR codes over fields of characteristic 2
In this subsection, we will find the generating idempotents of all the QR codes over any
field of characteristic 2. We will see that we only have to look at the generating idempotents
of QR codes over F2 and F4 .
239
6.6 Quadratic residue codes
Theorem 6.6.5 Let p be an odd prime. The following hold:
(i) Binary quadratic residue codes of length p exist if and only if p ≡ ±1 (mod 8).
(ii) The even-like binary quadratic residue codes have generating idempotents
δ+
xj
and δ +
j∈Q p
x j,
j∈N p
where δ = 1 if p ≡ −1 (mod 8) and δ = 0 if p ≡ 1 (mod 8).
(iii) The odd-like binary quadratic residue codes have generating idempotents
ǫ+
xj
j∈Q p
and ǫ +
x j,
j∈N p
where ǫ = 0 if p ≡ −1 (mod 8) and ǫ = 1 if p ≡ 1 (mod 8).
Proof: Binary QR codes of length p exist if and only if 2 ∈ Q p by Theorem 6.6.2, which
is equivalent to p ≡ ±1 (mod 8) by Lemma 6.2.5, giving (i). Let e(x) be a generating
idempotent of one of the QR codes. By Theorem 6.6.3,
e(x) =
xi ,
i∈S
where S is a union of some of {0}, Q p , and N p . As the cases S = {0}, Q p ∪ N p , and
{0} ∪ Q p ∪ N p yield trivial codes by Exercise 362, for QR codes, S equals {0} ∪ Q p ,
{0} ∪ N p , Q p , or N p . These yield the idempotents in (ii) and (iii); one only needs to check
that their weights are even or odd as required.
Exercise 362 Prove that if C is a binary cyclic code of length p with generating idempotent
p
e(x) = i∈S x i where S = {0}, Q p ∪ N p , or {0} ∪ Q p ∪ N p , then C is the code F2 , the
p
even subcode E p of F2 , or the subcode generated by the all-one vector.
Because 4 = 22 is obviously a square modulo any odd prime p, by Theorem 6.6.2, QR
codes over F4 exist for any odd prime length. The idempotents for the QR codes over F4
of length p ≡ ±1 (mod 8) are the same as the binary idempotents given in Theorem 6.6.5
by Theorem 6.6.4. We now find the generating idempotents for the QR codes over F4 of
length p where p ≡ ±3 (mod 8). (Recall that F4 = {0, 1, ω, ω}, where ω = 1 + ω = ω2 .)
Theorem 6.6.6 Let p be an odd prime. The following hold:
(i) If p ≡ ±1 (mod 8), the generating idempotents of the quadratic residue codes over F4
are the same as those over F2 given in Theorem 6.6.5.
(ii) The even-like quadratic residue codes over F4 have generating idempotents
δ+ω
j∈Q p
xj +ω
xj
and
j∈N p
δ+ω
j∈Q p
xj +ω
x j,
j∈N p
where δ = 0 if p ≡ −3 (mod 8) and δ = 1 if p ≡ 3 (mod 8).
(iii) The odd-like quadratic residue codes over F4 have generating idempotents
ǫ+ω
j∈Q p
xj +ω
xj
j∈N p
and
ǫ+ω
j∈Q p
xj +ω
x j,
j∈N p
where ǫ = 1 if p ≡ −3 (mod 8) and ǫ = 0 if p ≡ 3 (mod 8).
240
Duadic codes
Proof: Part (i) follows from Theorem 6.6.4. Let e(x) be a generating idempotent for an evenlike QR code C 1 over F4 with p ≡ ±3 (mod 8). By Theorem 6.6.3, e(x) = a0 + a1 Q(x) +
a2 N (x), where Q(x) = j∈Q p x j and N (x) = j∈N p x j . By Lemma 6.2.5, 2 ∈ N p . This
implies that
Q(x)2 = Q(x 2 ) = N (x)
N (x)2 = N (x 2 ) = Q(x)
and
by Lemma 6.6.1. Therefore as e(x)2 = a02 + a12 Q(x)2 + a22 N (x)2 = a02 + a12 N (x) +
a22 Q(x) = e(x), a0 ∈ {0, 1} and a2 = a12 . The other even-like QR code C 2 paired with
C 1 has generating idempotent e(x)µ2 by Theorem 6.6.4 as 2 ∈ N p . Again by Lemma 6.2.5,
Q(x)µ2 = Q(x 2 ) = N (x)
N (x)µ2 = N (x 2 ) = Q(x)
and
as 2 ∈ N p . Therefore e(x)µ2 = a0 + a12 Q(x) + a1 N (x). By Theorem 6.6.4(ii)
e(x) + e(x)µ2 = x + x 2 + · · · + x p−1 ,
implying that a1 + a12 = 1. The only possibility is a1 ∈ {ω, ω}. Notice that, for either choice
of a1 , a1 Q(x) + a12 N (x) is odd-like if p ≡ 3 (mod 8) and even-like if p ≡ −3 (mod 8), as
Q(x) and N (x) each have ( p − 1)/2 terms. Therefore parts (ii) and (iii) follow.
Example 6.6.7 We consider the binary QR codes of length 23. In that case,
Q23 = {1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18} and
N 23 = {5, 7, 10, 11, 14, 15, 17, 19, 20, 21, 22}.
The generating idempotents of the odd-like QR codes are
xj
x j,
and
j∈N23
j∈Q23
and the generating idempotents of the even-like QR codes are
1+
xj
j∈Q23
and
1+
x j.
j∈N23
Note that the 2-cyclotomic cosets modulo 23 are {0}, Q23 , and N 23 implying that these are
in fact the only binary duadic codes of length 23. The odd-like codes are the [23, 12, 7]
binary Golay code.
Example 6.6.8 Now consider the QR codes of length 5 over F4 ; note that
Q5 = {1, 4}
and N 5 = {2, 3}.
The generating idempotents of the odd-like QR codes are
1 + ω(x + x 4 ) + ω(x 2 + x 3 )
and
1 + ω(x + x 4 ) + ω(x 2 + x 3 ),
and the generating idempotents of the even-like QR codes are
ω(x + x 4 ) + ω(x 2 + x 3 )
and
ω(x + x 4 ) + ω(x 2 + x 3 ).
The 2-cyclotomic cosets modulo 5 are {0}, Q5 , and N 5 showing that the QR codes are the
only duadic codes of length 5 over F4 . The odd-like codes are equivalent to the punctured
hexacode; see Exercise 363. See also Example 6.1.8.
241
6.6 Quadratic residue codes
Exercise 363 Show that the odd-like QR codes of length 5 over F4 given in Example 6.6.8
are equivalent to the punctured hexacode, using the generator matrix of the hexacode found
in Example 1.3.4.
The idempotents that arise in Theorems 6.6.5 and 6.6.6 are the generating idempotents
for QR codes over any field of characteristic 2 as the next result shows.
Theorem 6.6.9 Let p be an odd prime. The following hold:
(i) Quadratic residue codes of length p over F2t , where t is odd, exist if and only if p ≡ ±1
(mod 8), and the generating idempotents are those given in Theorem 6.6.5.
(ii) Quadratic residue codes of length p over F2t , where t is even, exist for all p, and the
generating idempotents are those given in Theorems 6.6.5 and 6.6.6.
Proof: By Theorem 6.6.4, the result follows as long as we show that no quadratic residue
codes exist when t is odd and p ≡ ±3 (mod 8). By Theorem 6.6.5, these codes do not exist
if t = 1. If QR codes exist with t = 2s + 1 for some integer s ≥ 1, then 2t is a square
modulo p by Theorem 6.6.2. If 2t is a square modulo p, then 2 is also a square modulo p
as 2t = 2 · (2s )2 , contradicting Lemma 6.2.5.
Exercise 364 Give the generating idempotents for all quadratic residue codes of prime
length p ≤ 29 over F2t . Distinguish between the idempotents that generate even-like codes
and those that generate odd-like codes. Also distinguish between those that arise when t is
even and when t is odd.
Exercise 365 Let D1 and D2 be odd-like quadratic residue codes of prime length p over
F2t with even-like subcodes C 1 and C 2 . Prove the following:
(a) If p ≡ −1 (mod 8), then C 1 and C 2 are self-orthogonal under the ordinary inner
product.
⊥
(b) If p ≡ 1 (mod 8), then C ⊥
1 = D 2 and C 2 = D 1 .
(c) If p ≡ 3 (mod 8) and t is even, then C 1 and C 2 are self-orthogonal under the ordinary
inner product.
⊥
(d) If p ≡ −3 (mod 8) and t is even, then C ⊥
1 = D 2 and C 2 = D 1 .
(e) If t = 2 and either p ≡ −3 (mod 8) or p ≡ −1 (mod 8), then C 1 and C 2 are selforthogonal under the Hermitian inner product.
⊥H
H
(f) If t = 2 and either p ≡ 1 (mod 8) or p ≡ 3 (mod 8), then C ⊥
1 = D 2 and C 2 =
D1 .
6.6.2
QR codes over fields of characteristic 3
Analogous results hold for fields of characteristic 3. As in the last section, we let Q(x) =
j
j
j∈Q p x and N (x) =
j∈N p x . We assume our QR codes have length p an odd prime
that cannot equal 3. We first examine quadratic residue codes over F3 .
Theorem 6.6.10 Let p > 3 be prime. The following hold:
(i) Quadratic residue codes over F3 of length p exist if and only if p ≡ ±1 (mod 12).
242
Duadic codes
(ii) The even-like quadratic residue codes over F3 have generating idempotents
−
xj
j∈Q p
x j,
and −
j∈N p
if p ≡ 1 (mod 12), and
1+
xj
and
j∈Q p
1+
x j,
j∈N p
if p ≡ −1 (mod 12).
(iii) The odd-like quadratic residue codes over F3 have generating idempotents
1+
xj
and
j∈Q p
1+
x j,
j∈N p
if p ≡ 1 (mod 12), and
−
xj
j∈Q p
and −
x j,
j∈N p
if p ≡ −1 (mod 12).
Proof: Part (i) follows from Theorem 6.6.2 and Lemma 6.2.9. Let p ≡ ±1 (mod 12). If
e(x) is a generating idempotent for an even-like QR code C 1 over F3 , then by Theorem 6.6.3,
e(x) = a0 + a1 Q(x) + a2 N (x), where ai ∈ F3 for 0 ≤ i ≤ 2. The other even-like QR code
C 2 paired with C 1 has generating idempotent e(x)µb , where b ∈ N p by Theorem 6.6.4.
Lemma 6.6.1 implies that
Q(x)µb = Q(x b ) = N (x)
and
N (x)µb = N (x b ) = Q(x).
Therefore e(x)µb = a0 + a2 Q(x) + a1 N (x).
We first consider the case p ≡ 1 (mod 12). By Theorem 6.6.4(ii),
e(x) + e(x)µb = −x − x 2 − · · · − x p−1 ,
implying that 2a0 = 0 and a1 + a2 = −1. Thus a0 = 0 and either a1 = a2 = 1 or {a1 , a2 } =
{0, −1}. If a1 = a2 = 1, then e(x) = Q(x) + N (x); but Q(x) + N (x) = −(1 − j(x)),
which is the negative of the idempotent generator of the even-like code E p . Thus a1 = a2 = 1
is not possible. So the two possibilities remaining for {a1 , a2 } must lead to generating idempotents for the two even-like QR codes that we know must exist. The generating idempotents
for the odd-like codes follow from Theorem 6.6.4.
We now consider the case p ≡ −1 (mod 12). By Theorem 6.6.4(ii),
e(x) + e(x)µa = −1 + x + x 2 + · · · + x p−1 ,
implying that 2a0 = −1 and a1 + a2 = 1. Thus a0 = 1 and either a1 = a2 = −1 or
{a1 , a2 } = {0, 1}. Again 1 − Q(x) − N (x) = −(1 − j(x)) generates E p and so a1 = a2 =
−1 is impossible. The generating idempotents for the even-like and odd-like codes follow
as above.
243
6.6 Quadratic residue codes
Table 6.4 The field F9
ρi
a + bρ
ρi
a + bρ
ρi
a + bρ
0
1
ρ
0
1
ρ
ρ2
ρ3
ρ4
1+ρ
1−ρ
−1
ρ5
ρ6
ρ7
−ρ
−1 − ρ
−1 + ρ
Example 6.6.11 We find the generating idempotents of the QR codes of length 11 over F3 .
Here
Q11 = {1, 3, 4, 5, 9}
and N 11 = {2, 6, 7, 8, 10}.
The generating idempotents of the odd-like QR codes are
−(x + x 3 + x 4 + x 5 + x 9 )
and −(x 2 + x 6 + x 7 + x 8 + x 10 ),
and the generating idempotents of the even-like QR codes are
1 + x + x3 + x4 + x5 + x9
and
1 + x 2 + x 6 + x 7 + x 8 + x 10 .
The 3-cyclotomic cosets modulo 11 are {0}, Q11 , and N 11 , implying that the QR codes are
the only duadic codes of length 11 over F3 . The odd-like codes are the [11, 6, 5] ternary
Golay code. Compare this example to Example 6.1.7.
We now turn to QR codes over F9 . Because 9 = 32 is a square modulo any odd prime
p, by Theorem 6.6.2, QR codes over F9 exist for any odd prime length greater than 3.
The idempotents for the QR codes over F9 of length p ≡ ±1 (mod 12) are the same as the
idempotents given in Theorem 6.6.10 by Theorem 6.6.4. We now only need consider lengths
p where p ≡ ±5 (mod 12). The field F9 can be constructed by adjoining an element ρ to
F3 , where ρ 2 = 1 + ρ. So F9 = {a + bρ | a, b ∈ F3 }. Multiplication in F9 is described in
Table 6.4; note that ρ is a primitive 8th root of unity.
Theorem 6.6.12 Let p be an odd prime. The following hold:
(i) If p ≡ ±1 (mod 12), the generating idempotents of the quadratic residue codes over
F9 are the same as those over F3 given in Theorem 6.6.10.
(ii) The even-like quadratic residue codes over F9 have generating idempotents
1+ρ
j∈Q p
x j + ρ3
j∈N p
x j and 1 + ρ 3
j∈Q p
xj +ρ
x j,
j∈N p
if p ≡ 5 (mod 12), and
−ρ
j∈Q p
x j − ρ3
xj
and
j∈N p
− ρ3
j∈Q p
xj −ρ
x j,
j∈N p
if p ≡ −5 (mod 12).
(iii) The odd-like quadratic residue codes over F9 have generating idempotents
−ρ
j∈Q p
x j − ρ3
x j and
j∈N p
− ρ3
j∈Q p
xj −ρ
x j,
j∈N p
244
Duadic codes
if p ≡ 5 (mod 12), and
1+ρ
j∈Q p
x j + ρ3
j∈N p
x j and 1 + ρ 3
j∈Q p
xj +ρ
x j,
j∈N p
if p ≡ −5 (mod 12).
Proof: Part (i) follows from Theorem 6.6.4. Let e(x) be a generating idempotent for an
even-like QR code C 1 over F9 of length p with p ≡ ±5 (mod 12). Then by Theorem 6.6.3,
e(x) = a0 + a1 Q(x) + a2 N (x), where ai ∈ F9 for 0 ≤ i ≤ 2. Using Lemma 6.6.1, notice
that Q(x)3 = Q(x 3 ) = N (x) and N (x)3 = N (x 3 ) = Q(x) as 3 ∈ N p by Lemma 6.2.9.
As e(x)2 = e(x), we must have e(x)3 = e(x). Thus e(x)3 = a03 + a13 N (x) + a23 Q(x) =
e(x), implying that a03 = a0 and a2 = a13 . The other even-like QR code C 2 paired with
C 1 has generating idempotent e(x)µb , where b ∈ N p by Theorem 6.6.4. Again by
Lemma 6.6.1, Q(x)µb = Q(x b ) = N (x) and N (x)µb = N (x b ) = Q(x). Therefore, e(x) =
a0 + a1 Q(x) + a13 N (x) and e(x)µb = a0 + a13 Q(x) + a1 N (x).
We first consider the case p ≡ 5 (mod 12). By Theorem 6.6.4(ii),
e(x) + e(x)µb = −1 + x + x 2 + · · · + x p−1 ,
implying that 2a0 = −1 and a1 + a13 = 1. Thus a0 = 1; by examining Table 6.4, either
a1 = −1 or a1 ∈ {ρ, ρ 3 }. As 1 − Q(x) − N (x) = −(1 − j(x)) generates E p , a1 = −1 is
impossible. So the two possibilities remaining for a1 must lead to generating idempotents
for the two even-like QR codes that we know exist. The generating idempotents for the
odd-like codes follow from Theorem 6.6.4.
We leave the case p ≡ −5 (mod 12) as an exercise.
Exercise 366 Prove Theorem 6.6.12 in the case p ≡ −5 (mod 12).
The following theorem is analogous to Theorem 6.6.9 and is proved in the same way.
Theorem 6.6.13 Let p be an odd prime with p = 3. The following hold:
(i) Quadratic residue codes of length p over F3t , where t is odd, exist if and only if p ≡ ±1
(mod 12), and the generating idempotents are those given in Theorem 6.6.10.
(ii) Quadratic residue codes of length p over F3t , where t is even, exist for all p, and the
generating idempotents are those given in Theorems 6.6.10 and 6.6.12.
Exercise 367 Give the generating idempotents for all quadratic residue codes of prime
length p ≤ 29 over F3t . Distinguish between the idempotents that generate even-like codes
and those that generate odd-like codes. Also distinguish between those that arise when t is
even and when t is odd.
3
Exercise 368 Let D1 and D2 be odd-like quadratic residue codes of prime length p =
over F3t with even-like subcodes C 1 and C 2 . Prove the following:
(a) If p ≡ −1 (mod 12), then C 1 and C 2 are self-orthogonal.
⊥
(b) If p ≡ 1 (mod 12), then C ⊥
1 = D 2 and C 2 = D 1 .
(c) If p ≡ −5 (mod 12) and t is even, then C 1 and C 2 are self-orthogonal.
⊥
(d) If p ≡ 5 (mod 12) and t is even, then C ⊥
1 = D 2 and C 2 = D 1 .
245
6.6 Quadratic residue codes
6.6.3
Extending QR codes
As with any of the duadic codes, we can consider extending odd-like quadratic residue
codes in such a way that the extensions are self-dual or dual to each other. These extensions
may not be the ordinary extensions obtained by adding an overall parity check, but all
the extensions are equivalent to that obtained by adding an overall parity check. Before
examining the general case, we look at QR codes over F2 , F3 , and F4 . In these cases, it is
sufficient to use the ordinary extension.
Theorem 6.6.14 Let D1 and D2 be the odd-like QR codes over Fq of odd prime length p.
(i) When q = 2, the following hold:
2 are duals of each other when p ≡ 1 (mod 8).
1 and D
(a) D
(b) Di is self-dual and doubly-even for i = 1 and 2 when p ≡ −1 (mod 8).
(ii) When q = 3, the following hold:
i is self-dual for i = 1 and 2 when p ≡ −1 (mod 12).
(a) D
2 D are duals of each other where D is the
1 and D
(b) If p ≡ 1 (mod 12), then D
diagonal matrix diag(1, 1, . . . , 1, −1).
(iii) When q = 4, the following hold:
2 are duals of each other under either the ordinary
1 and D
(a) When p ≡ 1 (mod 8), D
or the Hermitian inner product.
2 are duals of each other under the Hermitian
1 and D
(b) When p ≡ 3 (mod 8), D
inner product; furthermore, Di is self-dual under the ordinary inner product for
i = 1 and 2.
2 are duals of each other under the ordinary
1 and D
(c) When p ≡ −3 (mod 8), D
inner product; furthermore, Di is self-dual under the Hermitian inner product for
i = 1 and 2.
i is self-dual under either the ordinary or the Hermitian
(d) When p ≡ −1 (mod 8), D
inner product for i = 1 and 2.
Proof: Let C i be the even-like subcode of Di .
We first consider the case q = 2 or q = 4. In either case, j(x) is the all-one vector and
its extension is the all-one vector of length p + 1; this extended all-one vector is orthogonal to itself under either the ordinary or Hermitian inner product. Suppose first that p ≡ 1
(mod 8). By Lemmas 6.2.4 and 6.2.5, −1 and −2 are both in Q p . Thus C i µ−1 = C i µ−2 = C i
for i = 1 and 2. Applying Theorems 6.4.3 and 6.4.6, we obtain (i)(a) and (iii)(a). Consider next the case p ≡ −1 (mod 8). This time by Lemmas 6.2.4 and 6.2.5, −1 and −2
are both in N p . Thus C 1 µ−1 = C 1 µ−2 = C 2 . By Theorems 6.4.2 and 6.4.5, we obtain
part of (i)(b) and all of (iii)(d). To complete (i)(b), we note that the generating idempotent for Di has weight ( p − 1)/2 by Theorem 6.6.5 and hence Di has a generator
i has a genmatrix consisting of shifts of this idempotent by Theorem 4.3.6. Thus D
i is
erator matrix consisting of vectors of weight (( p − 1)/2) + 1 ≡ 0 (mod 4). Thus D
doubly-even. The cases p ≡ ±3 (mod 8) arise only when q = 4. Using the same argument, if p ≡ 3 (mod 8), −1 ∈ N p and −2 ∈ Q p , yielding (iii)(b) by Theorems 6.4.2 and
6.4.6. If p ≡ −3 (mod 8), −1 ∈ Q p and −2 ∈ N p , yielding (iii)(c) by Theorems 6.4.3 and
6.4.5.
246
Duadic codes
Now let q = 3. Consider first the case p ≡ −1 (mod 12). Here the all-one vector extends to the all-one vector of length p + 1 and it is orthogonal to itself. By Lemma 6.2.4,
−1 ∈ N p and (ii)(a) follows from Theorem 6.4.2. Now suppose that p ≡ 1 (mod 12). By
Lemma 6.2.4, −1 ∈ Q p and by Theorem 6.4.3, C ⊥
1 = D 2 . In this case the all-one vector in
i . This vector is not orthogonal to itself, but is orthog1 = 11 · · · 1(−1) in D
Di extends to
onal to
1D = 1 p+1 , where 1 p+1 is the all-one vector of length p + 1 and D is the diagonal
matrix diag(1, 1, . . . , 1, − 1). This proves (ii)(b).
Example 6.6.15 We describe the extensions of the QR codes discussed in Examples 6.6.7,
6.6.8, and 6.6.11. We give the extension of the generating idempotent of odd-like codes; from
this one can form a basis of the extended codes using shifts of the generating idempotent
extended in the same way (see Theorem 4.3.6).
r The extended coordinate of either generating idempotent of an odd-like binary QR code
of length 23 is 1. These extended codes are each self-dual and doubly-even by Theorem 6.6.14. They are extended binary Golay codes.
r The extended coordinate of either generating idempotent of an odd-like QR code of
length 11 over F3 is −1. The extended codes are each self-dual by Theorem 6.6.14 and
are extended ternary Golay codes.
r The extended coordinate of either generating idempotent of an odd-like QR code of length
5 over F4 is 1. These extended codes are each Hermitian self-dual; they are also dual to
each other under the ordinary inner product by Theorem 6.6.14. They are equivalent to
the hexacode.
We are now ready to describe, in general, the extensions of the odd-like QR codes D1
and D2 of length p over Fq . We want them both to be extended in the same way whenever
for an arbitrary odd-like duadic code D of length n
possible. Recall that we defined D
using a solution γ of (6.11). As an odd-like QR code is obtained from its even-like subcode
1 to be self-dual or dual to D
2 , the
by adjoining the all-one vector 1, in order for either D
extended vector 1, which is 1 extended by some γ ∈ Fq , must be orthogonal to itself. This
means that p + γ 2 p 2 = 0 or
1 + γ 2 p = 0,
(6.12)
which is (6.11) with n = p. Suppose that C i is the even-like subcode of Di . We know
that either −1 ∈ N p or −1 ∈ Q p , implying C 1 µ−1 = C 2 or C i µ−1 = C i , respectively. By
Theorems 6.4.2 or 6.4.3, these yield C i⊥ = Di or C ⊥
1 = D 2 , respectively. Therefore, wheni is self-dual for i = 1 and 2, and if −1 ∈ Q p , D
1
ever (6.12) is satisfied, if −1 ∈ N p , D
2 are duals of each other. This proves the following, using Lemma 6.2.4.
and D
Theorem 6.6.16 Let D1 and D2 be the odd-like QR codes of length p over Fq , p an odd
i is self-dual for i = 1
prime. Suppose that (6.12) is satisfied. Then if p ≡ −1 (mod 4), D
and 2, and if p ≡ 1 (mod 4), D1 and D2 are duals of each other.
Thus we are interested in the cases where (6.12) has a solution γ in Fq . This is answered
by the following. We only need the case where q is a square modulo p since we are assuming
QR codes of length p over Fq exist.
247
6.6 Quadratic residue codes
Lemma 6.6.17 Let r be a prime so that q = r t for some positive integer t. Let p be an odd
prime with p = r and assume that q is a square modulo p. There is a solution γ of (6.12)
in Fq except when t is odd, p ≡ 1 (mod 4), and r ≡ −1 (mod 4). In that case there is a
solution γ1 in Fq of −1 + γ12 p = 0.
Proof: If t is even, then every quadratic equation with coefficients in Fr , such as (6.12), has
q is a square modulo p if and only if r is
a solution in Fr 2 ⊆ Fq . Assume that t is odd.
Then
a square modulo p as q = (r 2 )(t−1)/2r . So rp = 1. Solving (6.12) is equivalent to solving
x 2 = − p in Fr t . As t is odd, there is a solution in Fq if and only if the solution is in Fr . This
equation reduces to x 2 = 1 if r = 2, which obviously has a solution
x = 1. Assume
p that r is
odd. Thus we have a solution to x 2 = − p in Fr t if and only if −r p = −1
= 1. By
r
r
the Law of Quadratic Reciprocity,
p
r
r −1 p−1
= (−1) 2 2 .
p
r
Hence as rp = 1,
−1
p
r −1
r −1 p−1
= (−1) 2 (−1) 2 2
r
r
using Lemma 6.2.2. The only time the right-hand side of this equation is not 1 is when
p ≡ 1 (mod 4) and r ≡ −1 (mod 4). In this exceptional case
p
r −1 p−1
= (−1) 2 2 = 1,
r
showing that x 2 = p has a solution in Fq ; hence −1 + γ12 p = 0 has a solution γ1 in Fq .
Combining Theorem 6.6.16 and Lemma 6.6.17, we obtain the following.
Theorem 6.6.18 Let r be a prime so that q = r t for some positive integer t. Let p be an
odd prime with p = r and assume that q is a square modulo p. Assume that if t is odd, then
either p ≡ 1 (mod 4) or r ≡ −1 (mod 4). Let D1 and D2 be the two odd-like QR code over
Fq of length p.
i is self-dual for i = 1 and 2.
(i) If p ≡ −1 (mod 4), then D
2 are duals of each other.
(ii) If p ≡ 1 (mod 4), then D1 and D
This theorem gives the extension except when q = r t where t is odd, p ≡ 1 (mod 4), and
r ≡ −1 (mod 4). But in that case −1 + γ12 p = 0 has a solution γ1 in Fq by Lemma 6.6.17.
If D is an odd-like QR code of length p, define Ď = {č | c ∈ D}, where č = cc∞ =
c0 · · · c p−1 c∞ and
p−1
c∞ = −γ1
cj.
j=0
Theorem 6.6.19 Let r be a prime so that q = r t for some odd positive integer t and r ≡ −1
(mod 4). Assume p ≡ 1 (mod 4) is a prime such that q is a square modulo p. Let D1 and
D2 be the two odd-like QR codes over Fq of length p. Then Ď1 and Ď2 D are duals of each
other, where D is the diagonal matrix diag(1, 1, . . . , 1, −1).
248
Duadic codes
Proof: By our assumption and Lemma 6.6.17, −1 + γ12 p = 0 has a solution γ1 in Fq . As
p ≡ 1 (mod 4), −1 ∈ Q p . Let C i be the even-like subcode of Di . Then C i µ−1 = C i , and,
⊥
by Theorem 6.4.3, C ⊥
1 = D 2 and C 2 = D 1 . The result follows as 1̌ is orthogonal to 1̌D
2
because −1 + γ1 p = 0.
Note that if r = 2, there are two solutions γ of 1 + γ 2 p = 0 over Fr t , one the negative of
the other. Similarly −1 + γ12 p = 0 has two solutions. We can use either solution to define
the extensions. When r = 2, we can always choose γ = 1, and hence for the codes over
= D.
We see that when p ≡ −1 (mod 12), the solution of (6.12)
fields of characteristic 2, D
= D.
When p ≡ 1 (mod 12), (6.12) has no solution in
in F3 is γ = 1 and so here also D
2
F3 but one solution of −1 + γ p = 0 in F3 is γ1 = 1; thus in this case Ď = D.
1
We now look at the extended codes over fields of characteristic 2 and 3.
Corollary 6.6.20 Let D1 and D2 be odd-like QR codes over Fr t of length p.
(i) Suppose that r = 2. The following hold:
i is self-dual for i = 1 and 2 when p ≡ −1 (mod 8) or when p ≡ 3 (mod 8) with
(a) D
t even.
2 are duals of each other when p ≡ 1 (mod 8) or when p ≡ −3 (mod 8)
1 and D
(b) D
with t even.
(ii) Suppose that r = 3. The following hold:
i is self-dual for i = 1 and 2 when p ≡ −1 (mod 12).
(a) D
2 D are duals of each other, where D is the
1 and D
(b) If p ≡ 1 (mod 12), then D
diagonal matrix diag(1, 1, . . . , 1, −1).
i is self-dual for i = 1 and 2, where γ = ρ 2
(c) If p ≡ −5 (mod 12) with t even, then D
from Table 6.4.
2 are duals of each other.
1 and D
(d) If p ≡ 5 (mod 12) with t even, then D
Exercise 369 Explicitly find the solutions of 1 + γ 2 p = 0 and −1 + γ12 p = 0 over F2t
and F3t . Then use that information to prove Corollary 6.6.20.
Exercise 370 Do the following:
(a) Give the appropriate extended generating idempotents for all odd-like quadratic residue
codes of prime length p ≤ 29 over F2t . Distinguish between those that arise when t is
even and when t is odd. Also give the duality relationships between the extended codes.
See Exercise 364.
(b) Give the appropriate extended generating idempotents for all odd-like quadratic residue
codes of prime length p ≤ 29 over F3t . Distinguish between those that arise when t is
even and when t is odd. Also give the duality relationships between the extended codes.
See Exercise 367.
6.6.4
Automorphisms of extended QR codes
In this section, we briefly present information about the automorphism groups of the extended QR codes. Those interested in a complete description of the automorphism groups
of the extended QR codes (and extended generalized QR codes) should consult either [147]
249
6.6 Quadratic residue codes
D,
or Ď of one of the odd-like QR codes
or [149]. Let Dext be one of the extensions D,
D, whichever is appropriate. The coordinates of Dext are labeled {0, 1, . . . , p − 1, ∞} =
F p ∪ {∞}. Obviously, the maps Tg , for g ∈ F p , given by i Tg ≡ i + g (mod p) for all i ∈ F p
and ∞Tg = ∞ are in PAut(Dext ) as these act as cyclic shifts on {0, 1, . . . , p − 1} and fix
∞. Also if a ∈ Q p , the multiplier µa can be extended by letting ∞µa = ∞; by Theorem 6.6.3, this is also in PAut(Dext ). So far we have not found any automorphisms that
move the coordinate ∞; however, the Gleason–Prange Theorem produces such a map. We
do not prove this result.
D,
or Ď, where D is an
Theorem 6.6.21 (Gleason–Prange Theorem) Let Dext be D,
odd-like QR code of length p over Fq . Let P be the permutation matrix given by the
permutation that interchanges ∞ with 0, and also interchanges g with −1/g for g ∈ F p ,
g = 0. Then there is a diagonal matrix D, all of whose diagonal entries are ±1, such that
D P ∈ MAut(Dext ).
We have left the exact form of the diagonal matrix D rather vague as it depends on exactly
which generating idempotent is used for the unextended code. Note, however, that if the field
Fq has characteristic 2, then D is the identity matrix. The permutation matrices given by Tg
with g ∈ F p , µa with a ∈ Q p , and P generate a group denoted PSL2 ( p) called the projective
special linear group. In all but three cases, the full automorphism group ŴAut(Dext ) is only
slightly bigger (we can add automorphisms related to field automorphisms of Fq ). The three
exceptions occur when p = 23 and q = 2t (where Dext has the same basis as the extended
binary Golay code), when p = 11 and q = 3t (where Dext has the same basis as the extended
ternary Golay code), and when p = 5 and q = 4t (where Dext has the same basis as the
hexacode).
Exercise 371 Find the permutation in cycle form corresponding to the permutation matrix
P in the Gleason–Prange Theorem when p = 5, p = 11, and p = 23. Note that these values
of p are those arising in Example 6.6.15.
The automorphism from the Gleason–Prange Theorem together with Tg , for g ∈ F p , and
µa , for a ∈ Q p , show that MAut(Dext ) is transitive. This implies by Theorem 1.7.13, that
the minimum weight of D is its minimum odd-like weight. Because QR codes are duadic
codes, the Square Root Bound applies. We summarize this in the following.
Theorem 6.6.22 Let D be an odd-like QR code of length p over Fq . The minimum weight
of D is its minimum odd-like weight do . Furthermore, do2 ≥ p. If p ≡ −1 (mod 4), then
do2 − do + 1 ≥ p. Additionally, every minimum weight codeword is odd-like. If D is binary, its minimum weight d = do is odd, and if, in addition, p ≡ −1 (mod 8), then d ≡ 3
(mod 4).
Proof: All statements but the last follow from the Square Root Bound and Theorem 1.7.13, together with the observation that µ−1 gives the splitting for QR codes precisely when −1 ∈ N p , that is when p ≡ −1 (mod 4) by Lemma 6.2.4. If D is binary,
its minimum weight d is odd as odd-like binary vectors have odd weight. By Theo is doubly-even if p ≡ −1 (mod 8); as D
has minimum weight d + 1, d ≡
rem 6.6.14, D
3 (mod 4).
250
Duadic codes
Example 6.6.23 Using Theorem 6.6.22, we can easily determine the minimum weight of
the binary odd-like quadratic residue codes of lengths p = 23 and 47 and dimensions 12 and
24, respectively. Let D be such a code. As p ≡ −1 (mod 4), each satisfies do2 − do + 1 ≥ p
and do is odd. When p = 23, this bound implies that do ≥ 6 and hence do ≥ 7 as it is
odd. The Sphere Packing Bound precludes a higher value of do . By Theorem 6.6.14, D
is a [24, 12, 8] self-dual doubly-even code (the extended binary Golay code). When p =
47, do ≥ 8 by the bound and do ≡ 3 (mod 4) by Theorem 6.6.22. In particular, do ≥ 11.
The Sphere Packing Bound shows that do < 15, and as do ≡ 3 (mod 4), do = 11. Thus
is a [48, 24, 12] self-dual doublyD is a [47, 24, 11] code. Using Theorem 6.6.14, D
even code. At the present time there is no other known [48, 24, d] binary code with
d ≥ 12.
Exercise 372 Find the minimum distance of an odd-like binary QR code of length 31.
Example 6.6.24 Let D be an odd-like QR code over F3 of length p = 11 and dimension 6.
have minimum weight
is
Let D have minimum weight d and D
d. By Theorem 6.6.14, D
self-dual as p ≡ −1 (mod 12). Hence, 3 | d. By the Singleton Bound, d ≤ 7 implying that
d,
d = 6. As d ≥
d − 1, d ≥ 5.
d = 3 or
d = 6. By Theorem 6.6.22, d = do ≥ 4. As d ≤
The generating idempotent of D has weight 5 by Theorem 6.6.10. Thus D is an [11, 6, 5]
is a [12, 6, 6] self-dual code (the extended ternary Golay code).
code, and D
Example 6.6.25 Let D be an odd-like QR code over F4 of length p = 5 and dimension 3.
have minimum weight d and
Suppose that D and D
d, respectively. By Theorem 6.6.14, D
is Hermitian self-dual implying that 2 | d. By the Singleton Bound, d ≤ 3 and d ≤ 4. By
d = 4. Therefore D is a [5, 3, 3] code, and
Theorem 6.6.22, d = do ≥ 3. Thus d = 3 and
is a [6, 3, 4] Hermitian self-dual code (the hexacode).
D
Example 6.6.26 Consider the duadic codes of length n = 17 over F4 . The 4-cyclotomic
cosets modulo 17 are
C0 = {0}, C1 = {1, 4, 16, 13}, C2 = {2, 8, 15, 9}, C3 = {3, 12, 14, 5},
and C6 = {6, 7, 11, 10}.
The odd-like quadratic residue codes D1 and D2 have defining sets Q17 = C1 ∪ C2 and
1 and D
2 are duals of each other under either the
N 17 = C3 ∪ C6 ; the extended codes D
ordinary or Hermitian inner product by Theorem 6.6.14. Both D1 and D2 have binary
idempotents by Theorem 6.6.6. By Theorem 6.6.22, D1 and D2 have minimum weight at
1 and D
2 have minimum weight
least 5; as all minimum weight codewords are odd-like, D
at least 6. In fact, both are [18, 9, 6] codes. This information is contained in Table 6.1. There
are two other splittings given by S 1 = C1 ∪ C3 with S 2 = C2 ∪ C6 , and S ′1 = C1 ∪ C6 with
S ′2 = C2 ∪ C3 . The splittings are interchanged by µ6 , yielding equivalent pairs of codes.
The odd-like codes D′1 and D′2 with defining sets S 1 and S 2 have a splitting given by µ−2 .
′ are Hermitian self-dual. It turns out that both are [18, 9, 8]
′ and D
By Theorem 6.4.14, D
1
2
codes; these extended duadic codes have minimum weight higher than that of the extended
quadratic residue codes. These are the codes summarized in Table 6.3. It was shown in
[148] that a Hermitian self-dual [18, 9, 8] code over F4 is unique. Later, in [249], it was
shown that an [18, 9, 8] code over F4 is unique.
251
6.6 Quadratic residue codes
We conclude this section by presenting the automorphism groups of the binary extended
odd-like quadratic residue codes. For a proof, see [149]; we discuss (i) in Example 9.6.2
and (ii) in Section 10.1.2.
Theorem 6.6.27 Let p be a prime such that p ≡ ±1 (mod 8). Let D be a binary odd-like
of length p + 1. The following
quadratic residue code of length p with extended code D
hold:
= PAut(D)
is isomorphic to the affine group GA3 (2) of order
(i) When p = 7, ŴAut(D)
1344.
= PAut(D)
is isomorphic to the Mathieu group M24 of order
(ii) When p = 23, ŴAut(D)
244 823 040.
= PAut(D)
is isomorphic to the group PSL2 ( p) of order
(iii) If p ∈
/ {7, 23}, then ŴAut(D)
p( p 2 − 1)/2.
7
Weight distributions
In Chapter 1 we encountered the notion of the weight distribution of a code. In this chapter
we greatly expand on this concept.
The weight distribution (or weight spectrum) of a code of length n specifies the number of
codewords of each possible weight 0, 1, . . . , n. We generally denote the weight distribution
of a code C by A0 (C), A1 (C), . . . , An (C), or, if the code is understood, by A0 , A1 , . . . , An ,
where Ai = Ai (C) is the number of codewords of weight i. As a code often has many values
where Ai (C) = 0, these values are usually omitted from the list. While the weight distribution
does not in general uniquely determine a code, it does give important information of both
practical and theoretical significance. However, computing the weight distribution of a large
code, even on a computer, can be a formidable problem.
7.1
The MacWilliams equations
A linear code C is uniquely determined by its dual C ⊥ . The most fundamental result about
weight distributions is a set of linear relations between the weight distributions of C and C ⊥
which imply, in particular, that the weight distribution of C is uniquely determined by the
weight distribution of C ⊥ and vice versa. In other words, if we know the weight distribution
of C we can determine the weight distribution of C ⊥ without knowing specifically the
codewords of C ⊥ or anything else about its structure. These linear relations have been the
most significant tool available for investigating and calculating weight distributions. They
were first developed by MacWilliams in [213], and consequently are called the MacWilliams
equations or the MacWilliams identities. Since then there have been variations, most notably
the Pless power moments, which we will also examine.
Let C be an [n, k, d] code over Fq with weight distribution Ai = Ai (C) for 0 ≤ i ≤ n, and
let the weight distribution of C ⊥ be Ai⊥ = Ai (C ⊥ ) for 0 ≤ i ≤ n. The key to developing the
MacWilliams equations is to examine the q k × n matrix M whose rows are the codewords
of C listed in some order. As an illustration of how M can be used, consider the following.
The number of rows of M (i.e. the number of codewords of C) equals q k , but it also equals
n
⊥
i=0 Ai . Using the fact that A0 = 1 we obtain the linear equation
n
j=0
A j = q k A⊥
0.
(7.1)
253
7.1 The MacWilliams equations
We next count the total number of zeros in M in two different ways. By counting first by
rows, we see that there are
n−1
j=0
(n − j)A j
zeros in M. By Exercise 373 a column of M either consists entirely of zeros or contains
every element of Fq an equal number of times, and additionally M has A⊥
1 /(q − 1) zero
columns. Therefore, counting the number of zeros of M by columns, we see that the number
of zeros in M also equals
⊥
A⊥
1
k A1
k−1
⊥
+q
q
n−
= q k−1 n A⊥
0 + A1 ,
q −1
q −1
again using A⊥
0 = 1. Equating these two counts, we obtain
n−1
j=0
⊥
(n − j)A j = q k−1 n A⊥
0 + A1 .
(7.2)
Equations (7.1) and (7.2) are the first two equations in the list of n + 1 MacWilliams
equations relating the weight distributions of C and C ⊥ :
ν
n−ν
n− j
n− j
(7.3)
A j = q k−ν
A⊥j for 0 ≤ ν ≤ n.
ν
n
−
ν
j=0
j=0
Exercise 373 Let M be a q k × n matrix whose rows are the codewords of an [n, k] code
⊥
C over Fq . Let A⊥
1 be the number of codewords in C of weight 1. Prove that:
(a) a column of M either consists entirely of zeros or contains every element of Fq an
equal number of times, and
(b) M has A⊥
1 /(q − 1) zero columns.
We now show that all the MacWilliams equations follow from a closer examination of
the matrix M. This proof follows [40]. Before presenting the main result, we need two
lemmas. Let the coordinates of the code C be denoted {1, 2, . . . , n}. If I ⊆ {1, 2, . . . , n},
then I will denote the complementary set {1, 2, . . . , n} \ I . Recall that in Section 1.5 we
introduced the punctured code C I and the shortened code C I , each of length n − |I |,
where |I | is the size of the set I . If x ∈ C, x I denotes the codeword in C I obtained
from x by puncturing on I . Finally, let A⊥j (I ), for 0 ≤ j ≤ |I |, be the weight distribution
of (C I )⊥ .
Lemma 7.1.1 If 0 ≤ j ≤ ν ≤ n, then
n− j
⊥
A j (I ) =
A⊥j .
n−ν
|I |=ν
Proof: Let
X = {(x, I ) | x ∈ C ⊥ , wt(x) = j, |I | = ν, supp(x) ⊆ I },
254
Weight distributions
where supp(x) denotes the support of x. We count the number
n− j of elements in X in two
n− j
⊥
different ways. If wt(x) = j with
jx ∈⊥C , there are ⊥ν− j = n−ν sets II ⊥of size ν ≥ j that
contain supp(x). Thus |X | = n−
A j . There are A j (I ) vectors y ∈ (C ) with wt(y) = j.
n−ν
By Theorem 1.5.7, y ∈ (C ⊥ ) I , and so y = x I for a unique x ∈ C ⊥ with wt(x) = j and
supp(x) ⊆ I for all I of size ν. Thus |X | = |I |=ν A⊥j (I ), and the result follows by equating
the two counts.
In our second lemma, we consider again the q k × n matrix M whose rows are the
codewords of C, and we let M(I ) denote the q k × |I | submatrix of M consisting of the
columns of M indexed by I . Note that each row of M(I ) is a codeword of C I ; however,
each codeword of C I may be repeated several times.
Lemma 7.1.2 Each codeword of C I occurs exactly q k−k I times as a row of M(I ), where
k I is the dimension of C I .
Proof: The map f : C → C I given by f (x) = x I (i.e. the puncturing map) is linear and
surjective. So its kernel has dimension k − k I and the result follows.
Exercise 374 Prove that f : C → C I given by f (x) = x I used in the proof of Lemma 7.1.2
is indeed linear and surjective.
The verification of the n + 1 equations of (7.3) is completed by counting the ν-tuples of
zeros in the rows of M for each 0 ≤ ν ≤ n in two different ways.
Theorem 7.1.3 The equations of (7.3) are satisfied by the weight distributions of an [n, k]
code C over Fq and its dual.
Proof: Let Nν be the number of ν-tuples of zeros (not necessarily
consecutive) in the rows
of M for 0 ≤ ν ≤ n. Clearly, a row of M of weight j has n−ν j ν-tuples of zeros. Thus
n−ν
n− j
Aj,
Nν =
ν
j=0
which is the left-hand side of (7.3). By Lemma 7.1.1, the right-hand side of (7.3) is
ν
ν
ν
n− j
k−ν
q
A⊥j (I ) = q k−ν
A⊥j (I ).
A⊥j = q k−ν
n
−
ν
|I |=ν j=0
j=0
j=0 |I |=ν
But
ν
q k−ν
A⊥j (I ) = q ν−k I , as (C I )⊥ is a [ν, ν − k I ] code. Therefore
ν
n− j
q ν−k I =
q k−k I .
A⊥j = q k−ν
n
−
ν
|I |=ν
|I |=ν
j=0
j=0
By Lemma 7.1.2, q k−k I is the number of zero rows of M(I ), which is the number of |I |
tuples of zeros in the coordinate positions I in M. Thus |I |=ν q k−k I = Nν . Equating the
two counts for Nν completes the proof.
Corollary 7.1.4 The weight distribution of C uniquely determines the weight distribution
of C ⊥ .
255
7.2 Equivalent formulations
Proof: The (n + 1) × (n + 1) coefficient matrix of the A⊥j s on the right-hand side of (7.3)
is triangular with nonzero entries on the diagonal. Hence the A⊥j s are uniquely determined
by the A j s.
7.2
Equivalent formulations
There are several equivalent sets of equations that can be used in place of the MacWilliams
equations (7.3). We state five more of these after introducing some additional notation. One
of these equivalent formulations is most easily expressed by considering the generating
polynomial of the weight distribution of the code. This polynomial has one of two forms:
it is either a polynomial in a single variable of degree at most n, the length of the code, or
it is a homogeneous polynomial in two variables of degree n. Both are called the weight
enumerator of the code C and are denoted WC (x) or WC (x, y). The single variable weight
enumerator of C is
n
WC (x) =
Ai (C)x i .
i=0
By replacing x by x/y and then multiplying by y n , WC (x) can be converted to the two
variable weight enumerator
n
WC (x, y) =
Ai (C)x i y n−i .
i=0
Example 7.2.1 Consider the two binary codes C 1 and C 2 of Example 1.4.4 and Exercise 17
with generator matrices G 1 and G 2 , respectively, where
1 1 0 0 0 0
1 1 0 0 0 0
G 1 = 0 0 1 1 0 0 and G 2 = 0 1 1 0 0 0 .
1 1 1 1 1 1
0 0 0 0 1 1
Both codes have weight distribution A0 = 1, A2 = 3, A4 = 3, and A6 = 1. Hence
WC1 (x, y) = WC2 (x, y) = y 6 + 3x 2 y 4 + 3x 4 y 2 + x 6 = (y 2 + x 2 )3 .
Notice that C 1 = C ⊕ C ⊕ C, where C is the [2, 1, 2] binary repetition code, and that
WC (x, y) = y 2 + x 2 .
The form of the weight enumerator of C 1 from the previous example illustrates a general
fact about the weight enumerator of the direct sum of two codes.
Theorem 7.2.2 The weight enumerator of the direct sum C 1 ⊕ C 2 is
WC1 ⊕C2 (x, y) = WC1 (x, y)WC2 (x, y).
Exercise 375 Prove Theorem 7.2.2.
The simplicity of the form of the weight enumerator of a direct sum shows the power of
the notation.
256
Weight distributions
Shortly we will list six equivalent forms of the MacWilliams equations, with (7.3) denoted
(M1 ). This form, as well as the next, denoted (M2 ), involves only the weight distributions
and binomial coefficients. Both (M1 ) and (M2 ) consist of a set of n + 1 equations. The third
form, denoted (M3 ), is a single identity involving the weight enumerators of a code and its
dual. Although we originally called (M1 ) the MacWilliams equations, in fact, any of these
three forms is generally referred to as the MacWilliams equations. The fourth form is the
actual solution of the MacWilliams equations for the weight distribution of the dual code
in terms of the weight distribution of the original code; by Corollary 7.1.4 this solution
is unique. In order to give the solution in compact form, we recall the definition of the
Krawtchouck polynomial
k
x n−x
n,q
K k (x) =
(−1) j (q − 1)k− j
for 0 ≤ k ≤ n
j k− j
j=0
of degree k in x given in Chapter 2. The n + 1 equations that arise will be denoted (K ).
The final two sets of equations equivalent to (7.3), which are sometimes more convenient
for calculations, involve the Stirling numbers S(r, ν) of the second kind. These are defined
for nonnegative integers r, ν by the equation
1 ν
ν−i ν
(−1)
ir ;
S(r, ν) =
i
ν! i=0
in addition, they satisfy the recursion
S(r, ν) = ν S(r − 1, ν) + S(r − 1, ν − 1)
for 1 ≤ ν < r.
The number ν!S(r, ν) equals the number of ways to distribute r distinct objects into ν
distinct boxes with no box left empty; in particular,
S(r, ν) = 0
if r < ν
(7.4)
and
S(r, r ) = 1.
(7.5)
The following is a basic identity for the Stirling numbers of the second kind (see [204]):
r
j
ν!
S(r, ν).
(7.6)
jr =
ν
ν=0
The two forms of the equations arising here involve the weight distributions, binomial
coefficients, and the Stirling numbers. They will be called the Pless power moments and
denoted (P1 ) and (P2 ); they are due to Pless [258]. Each of (P1 ) and (P2 ) is an infinite set
of equations.
Exercise 376 Compute a table of Stirling numbers S(r, ν) of the second kind for 1 ≤ ν ≤
r ≤ 6.
We now state the six families of equations relating the weight distribution of an [n, k]
code C over Fq to the weight distribution of its dual C ⊥ . The theorem that follows asserts
257
7.2 Equivalent formulations
their equivalence.
n−ν
ν
n− j
n− j
k−ν
Aj = q
A⊥j for 0 ≤ ν ≤ n.
ν
n−ν
j=0
j=0
n
ν
j
j n− j
k−ν
(q − 1)ν− j A⊥j for 0 ≤ ν ≤ n.
Aj = q
(−1)
n−ν
ν
j=ν
j=0
WC ⊥ (x, y) =
A⊥j =
1
|C|
n
1
WC (y − x, y + (q − 1)x).
|C|
n,q
Ai K j (i)
i=0
min{n,r }
for 0 ≤ j ≤ n.
5
r
(M2 )
(M3 )
(K )
6
n− j
j Aj =
(−1)
ν!S(r, ν)q (q − 1)
for 0 ≤ r.
n−ν
ν= j
j=0
j=0
5
6
min{n,r }
r
n
n
−
j
A⊥j
ν!S(r, ν)q k−ν
for 0 ≤ r.
(n − j)r A j =
n−ν
ν= j
j=0
j=0
n
(M1 )
r
j
A⊥j
k−ν
ν− j
(P1 )
(P2 )
In showing the equivalence of these six families of equations, we leave many details as
exercises.
Theorem 7.2.3 The sets of equations (M1 ), (M2 ), (M3 ), (K ), (P1 ), and (P2 ) are equivalent.
Proof: Exercise 377 shows that by expanding the right-hand side of (M3 ) and equating
coefficients, equations (K ) arise; reversing the steps will give (M3 ) from (K ). Replacing y
by x + z in (M3 ), expanding, and equating coefficients gives (M1 ) as Exercise 378 shows;
again the steps can be reversed so that (M1 ) will produce (M3 ). Exercise 379 shows that
by replacing C by C ⊥ (and C ⊥ by C) and x by y + z in (M3 ), expanding, and equating
coefficients gives (M2 ); again the steps are reversible so that (M2 ) will produce (M3 ) with
C and C ⊥ interchanged. Thus the first four families are equivalent.
Notice that in (M2 ), the summation
We next prove that (M2 ) and (P1 ) are equivalent.
on the left-hand side can begin with j = 0 as νj = 0 if j < ν. Using (M2 ) and (7.6), we
calculate that
n
n
r
j
jr A j =
ν!
S(r, ν)A j
ν
j=0
j=0 ν=0
5
6
r
n
j
Aj
=
ν!S(r, ν)
ν
ν=0
j=0
5
6
r
n
k−ν
j
ν− j n − j
ν!S(r, ν) q
(−1) (q − 1)
=
A⊥j
n
−
ν
ν=0
j=0
5
6
n
r
j ⊥
k−ν
ν− j n − j
=
(−1) A j
ν!S(r, ν)q (q − 1)
.
n−ν
j=0
ν=0
j
= 0 if n − j < n − ν,
The last expression is the right-hand side of (P1 ) if we note that n−
n−ν
or equivalently ν < j. This allows us to start the inner sum at ν = j rather than ν = 0. As
258
Weight distributions
ν ≤ r , we have ν < j if r < j, which allows us to stop the outer sum at min{n, r }. Thus
(M2 ) holding implies that (P1 ) holds. The converse follows from the preceding equations
if we know that the n + 1 equations
r
ν=0
ν!S(r, ν)xν = 0
for 0 ≤ r ≤ n
have only the solution x0 = x1 = · · · = xn = 0. This is clear as the (n + 1) × (n + 1) matrix
[ar,ν ] = [ν!S(r, ν)] is lower triangular by (7.4) with nonzero diagonal entries by (7.5). So
(M2 ) and (P1 ) are equivalent.
A similar argument, which you are asked to give in Exercise 380 using (7.6) with j
replaced by n − j, gives the equivalence of (M1 ) and (P2 ). Thus all six sets of equations
are equivalent.
Exercise 377 Show that by expanding the right-hand side of (M3 ) and equating coefficients,
equations (K ) arise. Do this in such a way that it is easy to see that reversing the steps will
give (M3 ) from (K ).
Exercise 378 Show that by replacing y by x + z in (M3 ), expanding, and equating coefficients, you obtain (M1 ). Do this in such a way that the steps can be reversed so that (M1 )
will produce (M3 ).
Exercise 379 Show that by reversing the roles of C and C ⊥ in (M3 ) and then replacing x
by y + z, expanding, and equating coefficients produces (M2 ). Do this in such a way that
the steps can be reversed so that (M2 ) will yield (M3 ) with C and C ⊥ interchanged.
Exercise 380 Using (7.6) with j replaced by n − j, show that beginning with (M1 ), you
can obtain (P2 ). Prove that beginning with (P2 ), you can arrive at (M1 ) in an analogous way
to what was done in the proof of Theorem 7.2.3.
In the proof of Theorem 7.2.3, we interchanged the roles of C and C ⊥ in (M3 ), thus
reversing the roles of the Ai s and Ai⊥ s. You can obviously reverse the Ai s and Ai⊥ s in any
of the other equivalent forms as long as the dimension k of C is replaced by n − k, the
dimension of C ⊥ . You are asked in Exercise 381 to do precisely that.
Exercise 381 Write the five families of equations corresponding to (M1 ), (M2 ), (K ), (P1 ),
and (P2 ) with the roles of C and C ⊥ reversed.
Let C be the [5, 2] binary code generated by
Exercise 382
G=
1
0
1
1
0
1
0
1
0
.
1
(a) Find the weight distribution of C.
(b) Use one of (M1 ), (M2 ), (M3 ), (K ), (P1 ), or (P2 ) to find the weight distribution of C ⊥ .
(c) Verify your result in (b) by listing the vectors in C ⊥ .
259
7.3 A uniqueness result
Exercise 383 Let C be an [n, k] code over F4 .
n
Ai 3n−i (−1)i . Hint: Use
(a) Show that the number of vectors of weight n in C ⊥ is i=0
(M3 ).
(b) Show that if C has only even weight codewords, then C ⊥ contains a vector of
weight n.
7.3
A uniqueness result
The Pless power moments are particularly useful in showing uniqueness of certain solutions
to any of the families of identities from the last section. This is illustrated in the next result,
important for showing the existence of block designs in Chapter 8.
Theorem 7.3.1 Let S ⊆ {1, 2, . . . , n} with |S| = s. Then the weight distributions of C and
⊥
⊥
C ⊥ are uniquely determined by A⊥
1 , A2 , . . . , As−1 and the Ai with i ∈ S. These values can
be found from the first s equations in (P1 ).
⊥
⊥
Proof: Assume that A⊥
1 , A2 , . . . , As−1 and Ai for i ∈ S are known. The right-hand side
⊥
⊥
of the first s equations for 0 ≤ r < s of (P1 ) depend only on A⊥
0 = 1, A1 , . . . , As−1 . If
r
we move the terms j A j for j ∈ S from the left-hand side to the right-hand side, we are
left with s linear equations in unknowns A j with j ∈ S. The coefficient matrix of these s
equations is an s × s Vandermonde matrix
1
1
1
···
1
j1
j2
j3
···
js
2
2
2
j
j2
j3
···
js2
,
1
..
.
j1s−1
j2s−1
j3s−1
···
jss−1
where S = { j1 , . . . , js }. This matrix is nonsingular by Lemma 4.5.1 and hence Ai with
i ∈ S are determined. Thus all the Ai s are known, implying that the remaining Ai⊥ s are now
uniquely determined by (K ).
Theorem 7.3.1 is often applied when s is the minimum weight of C ⊥ and S = {1, 2, . . . , s}
⊥
⊥
so that A⊥
1 = A2 = · · · = As−1 = 0.
For ease of use, we compute the first five power moments from (P1 ):
n
j=0
n
j=0
n
j=0
A j = qk,
j A j = q k−1 qn − n − A⊥
1 ,
,
⊥
j 2 A j = q k−2 (q − 1)n(qn − n + 1) − (2qn − q − 2n + 2)A⊥
1 + 2A2 ,
260
Weight distributions
n
,
j 3 A j = q k−3 (q − 1)n(q 2 n 2 − 2qn 2 + 3qn − q + n 2 − 3n + 2)
j=0
− (3q 2 n 2 − 3q 2 n − 6qn 2 + 12qn + q 2 − 6q + 3n 2 − 9n + 6)A⊥
1
⊥
+ 6(qn − q − n + 2)A⊥
,
−
6A
2
3
n
(7.7)
,
j 4 A j = q k−4 (q − 1)n(q 3 n 3 − 3q 2 n 3 + 6q 2 n 2 − 4q 2 n + q 2 + 3qn 3 − 12qn 2
j=0
+ 15qn − 6q − n 3 + 6n 2 − 11n + 6)
− (4q 3 n 3 − 6q 3 n 2 + 4q 3 n − q 3 − 12q 2 n 3 + 36q 2 n 2 − 38q 2 n + 14q 2
+ 12qn 3 − 54qn 2 + 78qn − 36q − 4n 3 + 24n 2 − 44n + 24)A⊥
1
+ (12q 2 n 2 − 24q 2 n + 14q 2 − 24qn 2 + 84qn − 72q + 12n 2 − 60n + 72)A⊥
2
⊥
.
+
24A
− (24qn − 36q − 24n + 72)A⊥
4
3
In the binary case these become:
n
j=0
n
j=0
n
j=0
n
j=0
n
j=0
A j = 2k ,
j A j = 2k−1 n − A⊥
1 ,
,
⊥
j 2 A j = 2k−2 n(n + 1) − 2n A⊥
1 + 2A2 ,
(7.8)
,
⊥
⊥
j 3 A j = 2k−3 n 2 (n + 3) − (3n 2 + 3n − 2)A⊥
1 + 6n A2 − 6A3 ,
,
j 4 A j = 2k−4 n(n + 1)(n 2 + 5n − 2) − 4n(n 2 + 3n − 2)A⊥
1
⊥
⊥
+ 4(3n 2 + 3n − 4)A⊥
2 − 24n A3 + 24A4 .
Example 7.3.2 Let C be any self-dual [12, 6, 6] ternary code. As we will see in Section 10.4.1, the only such code turns out to be the extended ternary Golay code G 12 from
Section 1.9.2. Since the weight of each codeword is a multiple of 3 by Theorem 1.4.5,
Ai = 0 for i = 0, 6, 9, 12. By self-duality, Ai = Ai⊥ for all i. As A0 = 1, only A6 , A9 , and
⊥
A12 are unknown. Using Theorem 7.3.1 with s = 3 and S = {6, 9, 12}, as A⊥
1 = A2 = 0,
we can find these from the first three equations of (7.7):
A6 + A9 + A12 = 728,
6A6 + 9A9 + 12A12 = 5832,
36A6 + 81A9 + 144A12 = 48 600.
The unique solution is A6 = 264, A9 = 440, and A12 = 24. Thus the weight enumerator of
C, the extended ternary Golay code, is
WC (x, y) = y 12 + 264x 6 y 6 + 440x 9 y 3 + 24x 12 .
261
7.3 A uniqueness result
Exercise 384 Let C be any self-dual doubly-even binary [24, 12, 8] code. (We will see
in Section 10.1.1 that the only such code is the extended binary Golay code G 24 from
Section 1.9.1.)
(a) Show that A0 = A24 = 1 and the only unknown Ai s are A8 , A12 , and A16 .
(b) Using (7.8), show that
WC (x, y) = y 24 + 759x 8 y 16 + 2576x 12 y 12 + 759x 16 y 8 + x 24 .
(c) By Theorem 1.4.5(iii), A8 = A16 . Show that if we only use the first two equations of
(P1 ) together with A8 = A16 , we do not obtain a unique solution. Thus adding the
condition A8 = A16 does not reduce the number of power moment equations required
to find the weight distribution. See also Exercise 385.
Exercise 385 Let C be a binary code of length n where Ai = An−i for all 0 ≤ i ≤ n/2.
(a) Show that the first two equations of (P1 ) are equivalent under the conditions Ai = An−i .
(b) What can be said about the third and fourth equations of (P1 ) under the conditions
Ai = An−i ?
Example 7.3.3 Let C be a [16, 8, 4] self-dual doubly-even binary code. By Theorem 1.4.5(iv), A0 = A16 = 1. Thus in the weight distribution, only A4 , A8 , and A12 are
unknown as Ai = 0 when 4 ∤ i. In Exercise 386, you will be asked to find these values by
solving the first three equations in (7.8). We present an alternate solution. Letting s = 3 and
⊥
S = {4, 8, 12}, there is a unique solution by Theorem 7.3.1 since A⊥
1 = A2 = 0. Thus if we
can find one [16, 8, 4] self-dual doubly-even binary code, its weight enumerator will give this
3 , which is the unique [8, 4, 4] code (see Exercise 56) and has
unique solution. Let C 1 = H
weight enumerator WC1 (x, y) = y 8 + 14x 4 y 4 + x 8 (see Exercise 19 and Example 1.11.7).
By Section 1.5.4, C 1 ⊕ C 1 is a [16, 8, 4] code. By Theorem 7.2.2,
WC1 ⊕C1 (x, y) = (y 8 + 14x 4 y 4 + x 8 )2
= y 16 + 28x 4 y 12 + 198x 8 y 8 + 28x 12 y 4 + x 16 .
As all weights in C 1 ⊕ C 1 are divisible by 4, this code is self-dual doubly-even by Theorem 1.4.8; thus we have produced the weight enumerator of C. We will explore this example
more extensively later in this chapter.
Exercise 386 Find the weight enumerator of a [16, 8, 4] self-dual doubly-even binary code
by solving the first three equations in (7.8).
Example 7.3.4 Let C be the [n, k] simplex code over Fq where n = (q k − 1)/(q − 1).
These codes were developed in Section 1.8; the dual code C ⊥ is the [n, n − k, 3] Hamming
code Hq,k . By Theorem 2.7.5, the weight distribution of C is A0 = 1, Aq k−1 = q k − 1, and
⊥
⊥
Ai = 0 for all other i. Therefore the weight distribution A⊥
0 , A1 , . . . , An of Hq,k can be
determined. For example, using (K ) we obtain
, n,q
n,q
A⊥j = q −k K j (0) + (q k − 1)K j (q k−1 )
n,q
−k
j n
=q
(q − 1)
+ (q k − 1)K j (q k−1 )
j
for 0 ≤ j ≤ n.
262
Weight distributions
Exercise 387 Find the weight enumerator of the [15, 11, 3] binary Hamming code.
To conclude this section, we use (7.7) to produce an interesting result about ternary codes,
due to Kennedy [164].
Theorem 7.3.5 Let C be an [n, k] ternary code with k ≥ 3 and weight distribution
A0 , A1 , . . . , An . For 0 ≤ i ≤ 2, let
Ni =
Aj.
j≡i (mod 3)
Then N0 ≡ N1 ≡ N2 ≡ 0 (mod 3).
Proof: Because k ≥ 3, the first three equations of (7.7) with q = 3 yield
n
N0 + N1 + N2 ≡
N1 − N2 ≡
N1 + N2 ≡
i=0
n
i=0
n
i=0
Ai ≡ 0 (mod 3),
i Ai ≡ 0 (mod 3),
i 2 Ai ≡ 0 (mod 3),
since i ≡ 0, 1, or −1 and i 2 ≡ 0 or 1 modulo 3. The only solution modulo 3 is N0 ≡ N1 ≡
N2 ≡ 0 (mod 3).
Exercise 388 Show that the following weight distributions for ternary codes cannot occur:
(a) A0 = 1, A5 = 112, A6 = 152, A8 = 330, A9 = 110, A11 = 24.
(b) A0 = 1, A3 = 22, A4 = 42, A5 = 67, A6 = 55, A7 = 32, A8 = 24.
7.4
MDS codes
In this section, we show that the weight distribution of an [n, k] MDS code over Fq is
determined by the parameters n, k, and q. One proof of this uses the MacWilliams equations;
see [275]. We present a proof here that uses the inclusion–exclusion principle.
Theorem 7.4.1 Let C be an [n, k, d] MDS code over Fq . The weight distribution of C is
given by A0 = 1, Ai = 0 for 1 ≤ i < d, and
i−d
n
i
Ai =
(−1) j
(q i+1−d− j − 1)
i j=0
j
for d ≤ i ≤ n, where d = n − k + 1.
Proof: Clearly, A0 = 1 and Ai = 0 for 1 ≤ i < d. Let C T be the code of length n − |T |
obtained from C by shortening on T. By repeated application of Exercise 317, C T is an
[n − |T |, k − |T |] MDS code if |T | < k and C T is the zero code if |T | ≥ k. Let C(T ) be the
263
7.4 MDS codes
subcode of C which is zero on T . Then C T is C(T ) punctured on T. In particular
k−t
if t = |T | < k,
q
|C(T )| =
1
if t = |T | ≥ k,
(7.9)
and C(T ) is the set of codewords of C whose support is disjoint from T. For 0 ≤ t ≤ n,
define
Nt =
|T |=t
|C(T )|.
By (7.9),
n k−t
t q
Nt =
n
t
if t = |T | < k,
(7.10)
if t = |T | ≥ k.
Note that Nt counts the number of codewords of C that have weight n − t or less, with
a codeword multiply counted, once for every C(T ) it is in. By the inclusion–exclusion
principle [204], for d ≤ i ≤ n,
i
n−i + j
Nn−i+ j .
(−1) j
Ai =
j
j=0
By (7.10),
n−i + j
n
q k−(n−i+ j)
j
n
−
i
+
j
j=0
i
n−i + j
n
+
(−1) j
j
n−i + j
j=k+i−n
5
i−d
6
i
n
j i
i+1−d− j
j i
=
,
(−1)
+
(−1)
q
i
j
j
j=0
j=i−d+1
k+i−n−1
(−1) j
Ai =
as d = n − k + 1 and
i−d
−
(−1) j
j=0
i
=
j
n−i+ j
j
n
n−i+ j
i
(−1) j
j=i−d+1
=
i
,
j
n i
i
j
. As
the result follows.
In the notation of the proof of Theorem 7.4.1, show that
t +1
t +2
n
+
An−t−1 +
An−t−2 + · · · +
A0 .
1
2
n−t
(7.11)
Exercise 389
Nt = An−t
Exercise 390 Verify (7.11). Hint: 0 = (1 − 1)i .
Exercise 391 Find the weight enumerator of the [6, 3, 4] hexacode over F4 .
264
Weight distributions
Exercise 392 (Hard) Do the following:
(a) Prove that
m
i
i −1
(−1) j
= (−1)m
.
j
m
j=0
(b) The weight distribution of an [n, k] MDS code over Fq can be given by
i−d
n
j i −1
Ai =
(−1)
q i−d− j
(q − 1)
j
i
j=0
for d ≤ i ≤ n, where d = n − k + 1. Verify the equivalence of this form with that given
in Theorem 7.4.1. Hint: Factor q − 1 out of the expression for Ai from the theorem to
create a double summation. Interchange the order of summation and use (a).
Corollary 7.4.2 If C is an [n, k, d] MDS code over Fq , then Ad = (q − 1) dn .
Exercise 393 Prove Corollary 7.4.2 and give a combinatorial interpretation of the result.
Corollary 7.4.3 Assume that there exists an [n, k, d] MDS code C over Fq .
(i) If 2 ≤ k, then d = n − k + 1 ≤ q.
(ii) If k ≤ n − 2, then k + 1 ≤ q.
n 2
Proof:
n By Theorem 7.4.1, if d < n, then Ad+1 = d+1⊥((q − 1) − (d + 1)(q − 1)) =
(q − 1)(q − d). As 0 ≤ Ad+1 , (i) holds. Because C is an [n, n − k, k + 1] MDS
d+1
code by Theorem 2.4.3, applying (i) to C ⊥ shows that if 2 ≤ n − k, then k + 1 ≤ q, which
is (ii).
The preceding corollary provides bounds on the length and dimension of [n, k] MDS
codes over Fq . If k = 1, there are arbitrarily long MDS codes, and these codes are all
monomially equivalent to repetition codes; if k = n − 1, there are again arbitrarily long
MDS codes that are duals of the MDS codes of dimension 1. These [n, 1] and [n, n − 1] MDS
codes are among the codes we called trivial MDS codes in Section 2.4. Certainly, the zero
code and the whole space Fqn are MDS and can also be arbitrarily long. By Corollary 7.4.3(ii),
if k ≤ n − 2, then k ≤ q − 1. So nontrivial [n, k] MDS codes can exist only when 2 ≤ k ≤
min{n − 2, q − 1}. By Corollary 7.4.3(i), n ≤ q + k − 1 implying that nontrivial MDS
codes over Fq cannot be arbitrarily long; in particular as k ≤ min{n − 2, q − 1} ≤ q − 1,
n ≤ 2q − 2. In summary we have the following.
Corollary 7.4.4 Assume that there exists an [n, k, d] MDS code over Fq .
(i) If the code is trivial, it can be arbitrarily long.
(ii) If the code is nontrivial, then 2 ≤ k ≤ min{n − 2, q − 1} and n ≤ q + k − 1 ≤ 2q − 2.
Recall that the generalized Reed–Solomon codes and extended generalized Reed–
Solomon codes over Fq of lengths n ≤ q + 1 are MDS by Theorem 5.3.4. When q is
even, there are [q + 2, 3] and [q + 2, q − 1] MDS codes related to GRS codes. These are
the longest known MDS codes, which leads to the following:
265
Y
L
7.5 Coset weight distributions
F
MDS Conjecture If there is a nontrivial [n, k] MDS code over Fq , then n ≤ q + 1, except
when q is even and k = 3 or k = q − 1 in which case n ≤ q + 2.
The MDS conjecture is true for k = 2 by Corollary 7.4.4(ii). The conjecture has been
proved in certain other cases, an instance of which is the following.
m
a
e
T
Theorem 7.4.5 Let C be an [n, k] MDS code over Fq .
√
(i) [299] If q is odd and 2 ≤ k < ( q + 13)/4, then n ≤ q + 1.
√
(ii) [321] If q is even and 5 ≤ k < (2 q + 15)/4, then n ≤ q + 1.
7.5
Coset weight distributions
In this section we consider the weight distribution of a coset of a code. We will find that
some cosets will have uniquely determined distributions. Before we proceed with the main
result, we state a lemma that is used several times.
Lemma 7.5.1 Let C be an [n, k] code over Fq . Let v be a vector in Fqn but not in C, and let
D be the [n, k + 1] code generated by C and v. The following hold:
(i) The weight distributions of v + C and αv + C, when α = 0, are identical.
(ii) Ai (v + C) = Ai (D \ C)/(q − 1) for 0 ≤ i ≤ n.
Proof: Part (i) follows from the two observations that wt(α(v + c)) = wt(v + c) and αv +
C = α(v + C). Part (ii) follows from part (i) and the fact that D is the disjoint union of the
cosets αv + C for α ∈ Fq .
The following theorem is an application of Theorem 7.3.1. Part (i) is the linear case of a
more general result of Delsarte [62] on the covering radius ρ(C) of a code C. The remainder
of the theorem is due to Assmus and Pless [9]. Recall that the weight of a coset of a code
is the smallest weight of any vector in the coset; a coset leader is a vector in a coset of this
smallest weight.
Theorem 7.5.2 Let C be an [n, k] code over Fq . Let S = {i > 0 | Ai (C ⊥ ) = 0} and s = |S|.
Then:
(i) ρ(C) ≤ s,
(ii) each coset of C of weight s has the same weight distribution,
(iii) the weight distribution of any coset of weight less than s is determined once the number
of vectors of weights 1, 2, . . . , s − 1 in the coset are known, and
(iv) if C is an even binary code, each coset of weight s − 1 has the same weight distribution.
Proof: Let v be a coset leader of a coset of C with wt(v) = w. Let D be the (k + 1)dimensional code generated by C and v. As D⊥ ⊂ C ⊥ , Ai (D⊥ ) = 0 if i = 0 and i ∈ S.
As v is a coset leader, wt(x) ≥ w for all x ∈ D \ C by Lemma 7.5.1(ii). In particular,
Ai (D) = Ai (C) for 0 ≤ i < w, and w is the smallest weight among all the vectors in D \ C.
Also, if the weight distribution of C is known, then knowing the weight distribution of D is
equivalent to knowing the weight distribution of v + C by Lemma 7.5.1(ii).
266
Weight distributions
Assume first that w ≥ s. Thus A1 (D), A2 (D), . . . , As−1 (D) are known as they are
A1 (C), A2 (C), . . . , As−1 (C). Also Ai (D⊥ ) is known for i ∈ S. Note that none of these numbers depend on v, only that w ≥ s. By Theorem 7.3.1 the weight distribution of D and D⊥
is uniquely determined; thus this distribution is independent of the choice of v. But then
the weight distribution of D \ C is uniquely determined, and so the coset leader, being the
smallest weight of vectors in D \ C, has a uniquely determined weight. Since we know
that there is some coset leader v with wt(v) = ρ(C) and that there are cosets of all smaller
weights by Theorem 1.12.6(v), this uniquely determined weight must be ρ(C). Hence it is
not possible for s to be smaller than ρ(C), proving (i). When wt(v) = s, the weight distribution of D \ C is uniquely determined and so, therefore, is the weight distribution of
v + C by Lemma 7.5.1(ii), proving (ii). If w < s, Theorem 7.3.1 shows the weight distributions of D and D⊥ are known as long as A1 (D), A2 (D), . . . , As−1 (D) are known; but
this is equivalent to knowing the number of vectors in v + C of weights 1, 2, . . . , s − 1,
yielding (iii).
Assume now that C is a binary code with only vectors of even weight. Let v be a coset
leader of a coset of C with wt(v) = s − 1. Then D has only even weight vectors if s − 1
is even, in which case An (D⊥ ) = 1; if s − 1 is odd, D has odd weight vectors and so
An (D⊥ ) = 0. Therefore Ai (D⊥ ) is known except for values in S \ {n}, a set of size s − 1
as An (C ⊥ ) = 1. Because Ai (D) = Ai (C) for 0 ≤ i ≤ s − 2, by Theorem 7.3.1, the weight
distribution of D, and hence of D \ C = v + C, is uniquely determined.
Example 7.5.3 The complete coset weight distribution of the [8, 4, 4] extended binary
3 was given in Example 1.11.7. We redo this example in light of
Hamming code C = H
the previous theorem. This code is self-dual with A0 (C) = 1, A4 (C) = 14, and A8 (C) = 1.
Thus in Theorem 7.5.2, S = {4, 8} and so s = 2. By parts (ii) and (iv) of this theorem, the
distributions of the cosets of weights 1 and 2 are each uniquely determined, and by part (i)
there are no other coset weights. Recall from Exercise 72 that odd weight cosets have only
odd weight vectors and even weight cosets have only even weight vectors as C has only even
weight vectors. In particular, all odd weight vectors in F82 are in the cosets of weight 1. If
there are two weight 1 vectors in a coset of weight 1, their difference is a weight two vector
in the code, which is a contradiction. Hence
there are precisely eight cosets of weight 1.
If v + C has weight 1, then Ai (v + C) = 8i /8 for i odd, which agrees with the results in
Example 1.11.7. There are 24 − 9 = 7
remaining
cosets;
all must have weight 2. Thus if
v + C has weight 2, then Ai (v + C) = 8i −Ai (C) /7 for i even, again agreeing with the
results in Example 1.11.7.
Exercise 394 Find the weight distribution of the cosets of the [7, 4, 3] binary Hamming
code H3 .
In the next example, we use the technique in the proof of Theorem 7.5.2 to find the weight
distribution of a coset of weight 4 in a [16, 8, 4] self-dual doubly-even binary code. In the
exercises, the weight distributions of the cosets of weights 1 and 3 will be determined in
the same manner. Determining the weight distributions of the cosets of weight 2 is more
complicated and requires some specific knowledge of the code; this will also be considered
in the exercises and in Example 7.5.6.
267
7.5 Coset weight distributions
Example 7.5.4 Let C be a [16, 8, 4] self-dual doubly-even binary code. From Example 7.3.3, its weight distribution is given by A0 (C) = A16 (C) = 1, A4 (C) = A12 (C) = 28,
and A8 (C) = 198. In the notation of Theorem 7.5.2, S = {4, 8, 12, 16} and s = 4. Hence
the covering radius of C is at most 4. Suppose that v is a coset leader of a coset of C of weight
4. (Note that we do not know if such a coset leader exists.) Let D be the [16, 9, 4] linear
code generated by C and v; thus D = C ∪ (v + C). D is an even code, and D⊥ ⊂ C. Let
Ai = Ai (D⊥ ) and Ai⊥ = Ai (D). Since D is even, D⊥ contains the all-one vector. Thus we
know A0 = A16 = 1 and Ai = 0 for all other i except possibly i = 4, 8, or 12 as D⊥ ⊂ C.
Thus using the first three equations from (7.8), we obtain
1 + A4 + A8 + A12 + 1 = 27 ,
4A4 + 8A8 + 12A12 + 16 = 26 · 16,
16A4 + 64A8 + 144A12 + 256 = 25 · 16 · 17.
Solving produces A4 = A12 = 12 and A8 = 102. As D contains the all-one vector, Ai⊥ =
⊥
A⊥
16−i . As D is even with minimum weight 4, Ai = 0 for i odd, and hence we only need
⊥
⊥
A⊥
4 , A6 , and A8 to determine the weight distribution of D completely. Using (M3 ), (K ),
⊥
⊥
or (P1 ) we discover that A⊥
4 = 44, A6 = 64, and A8 = 294. The weight distribution of the
coset v + C is Ai (v + C) = Ai (D) − Ai (C). So the weight distribution of a coset of weight 4
is uniquely determined to be:
Weight
4
6
8
10
12
Number of vectors
16
64
96
64
16
Notice that we still do not know if such a coset exists.
Exercise 395 Let C be a [16, 8, 4] self-dual doubly-even binary code. Analogous to
Example 7.5.4, let v be a coset leader of a coset of C of weight 3 and let D be the [16, 9, 3]
code D = C ∪ (v + C). Let Ai = Ai (D⊥ ) and Ai⊥ = Ai (D). (As with the coset of weight 4,
we do not know if such a coset exists.)
(a) Show that A16 = 0 and that the only unknown Ai s are A4 , A8 , and A12 .
⊥
⊥
(b) Show that Ai⊥ = A⊥
16−i and A1 = A2 = 0.
(c) Use the first three equations of (7.8) to compute A4 , A8 , and A12 .
(d) By Exercise 72, v + C has only odd weight vectors. Show that all odd weight vectors
in D are in v + C.
(e) Use (M3 ), (K ), or (P1 ) to find the weight distribution of v + C.
Exercise 396 Let C be a [16, 8, 4] self-dual doubly-even binary code. Analogous to Exercise 395, let v be a coset leader of a coset of C of weight 1 and let D be the [16, 9, 1] code
D = C ∪ (v + C). Let Ai = Ai (D⊥ ) and Ai⊥ = Ai (D).
(a) Why do we know that a coset of weight 1 must exist?
(b) Show that A16 = 0 and that the only unknown Ai s are A4 , A8 , and A12 .
⊥
⊥
(c) Show that Ai⊥ = A⊥
16−i , A1 = 1, and A2 = 0.
(d) Use the first three equations of (7.8) to compute A4 , A8 , and A12 .
(e) By Exercise 72, v + C has only odd weight vectors. Show that all odd weight vectors
in D are in v + C.
(f) Use (M3 ), (K ), or (P1 ) to find the weight distribution of v + C.
268
Weight distributions
Exercise 397 Let C be a [16, 8, 4] self-dual doubly-even binary code. After having done
Exercises 395 and 396, do the following:
(a) Show that all odd weight vectors of F16
2 are in a coset of C of either weight 1 or weight 3.
(b) Show that there are exactly 16 cosets of C of weight 1 and exactly 112 cosets of C of
weight 3.
Notice that this exercise shows that the cosets of weight 3 in fact exist.
Before proceeding with the computation of the weight distributions of the cosets of
weight 2 in the [16, 8, 4] codes, we present the following useful result.
Theorem 7.5.5 Let C be a code of length n over Fq with A ∈ ŴAut(C). Let v ∈ Fqn . Then
the weight distribution of the coset v + C is the same as the weight distribution of the coset
vA + C.
Proof: If x ∈ Fqn , then applying A to x rescales the components with nonzero scalars, permutes the components, and then maps them to other field elements under a field automorphism. Thus wt(x) = wt(xA). As A ∈ ŴAut(C), vA + C = {vA + c | c ∈ C} = {(v + c)A |
c ∈ C} implying that vA + C and v + C have the same weight distribution.
Example 7.5.6 Let C be a [16, 8, 4] self-dual doubly-even binary code. By [262], up to
equivalence there are two such codes. Each code has different weight distributions for its
cosets of weight 2. These two codes, denoted C 1 and C 2 , have generator matrices G 1 and
G 2 , respectively, where
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0
1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0
G1 =
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0
0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1
0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0
and
1
1
1
1
G2 =
1
1
1
1
1
1
1
1
1
1
1
0
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
.
0
0
1
0
In this example, we compute the weight distribution of the cosets of C 1 of weight 2. Notice
3 by
that C 1 is the direct sum of an [8, 4, 4] code with itself; the only such code is H
Exercise 56. Let v be a weight 2 vector that is a coset leader of a coset of C 1 . As we want to
find the weight distribution of v + C 1 , by Theorem 7.5.5, we may replace v by vA, where A
269
7.5 Coset weight distributions
is a product of the automorphisms of C 1 given in Exercise 398. We use the notation of that
exercise. Let supp(v) = {a, b}; when we replace v by vA, we will still denote the support
of this new vector by {a, b}, where a and b will have different values. If both a and b are at
least 9, apply P5 to v. Thus we may assume that 1 ≤ a < b ≤ 8 or 1 ≤ a ≤ 8 < b. In the
first case, if a = 1, apply powers of P3 followed by P1 so that we may assume that a = 1;
now apply powers of P3 again so that we may assume that b = 2. In the second case, if
a = 1, apply powers of P3 followed by P1 so that we may assume that a = 1; if b = 9,
apply powers of P4 followed by P2 so that we may assume that b = 9. Thus there are only
two possible coset leaders that we need to consider:
(a) v1 = 1100000000000000, and
(b) v2 = 1000000010000000.
We first consider v = v1 . Let D be the [16, 9, 2] code D = C 1 ∪ (v1 + C 1 ). Let Ai = Ai (D⊥ )
and Ai⊥ = Ai (D). D is an even code and so D⊥ contains the all-one vector. Thus we know
A0 = A16 = 1 and Ai = 0 for all other i except possibly i = 4, 8, or 12 as D⊥ ⊂ C 1 . We also
⊥
know A⊥
1 = 0 and need to determine A2 . The only weight 2 vectors in v1 + C 1 other than
v1 are v1 + c, where c is a vector of weight 4 whose support contains {1, 2}. By inspecting
G 1 , it is clear that the only choices for such a c are the first three rows of G 1 . Therefore
A⊥
2 = 4. Thus using the first three equations from (7.8), we obtain
1 + A4 + A8 + A12 + 1 = 27 ,
4A4 + 8A8 + 12A12 + 16 = 26 · 16,
16A4 + 64A8 + 144A12 + 256 = 25 (16 · 17 + 8).
Solving yields A4 = A12 = 20 and A8 = 86. As D contains the all-one vector, Ai⊥ = A⊥
16−i .
⊥
⊥
=
310.
As
A
(v
+
C
=
60,
and
A
=
36,
A
Using (M3 ), (K ), or (P1 ), we have A⊥
i
1
1) =
8
6
4
Ai (D) − Ai (C 1 ), the weight distribution of the coset v1 + C 1 is:
Weight
2
4
6
Number of vectors
4
8
60
8
10
12
14
112 60
8
4
Now consider v = v2 . Let D be the [16, 9, 2] code D = C 1 ∪ (v2 + C 1 ), which is still
even. As in the case v = v1 , A0 = A16 = 1 and only A4 , A8 , and A12 need to be determined
⊥
as the other Ai s are 0. Also A⊥
1 = 0, but A2 = 1 because there are clearly no weight 4
vectors in C 1 whose support contains {1, 9}. Thus using the first three equations from (7.8),
we have
1 + A4 + A8 + A12 + 1 = 27 ,
4A4 + 8A8 + 12A12 + 16 = 26 · 16,
16A4 + 64A8 + 144A12 + 256 = 25 (16 · 17 + 2).
⊥
In the same manner as for v = v1 , we obtain A4 = A12 = 14, A8 = 98, A⊥
4 = 42, A6 = 63,
and A⊥
8 = 298. The weight distribution of the coset v2 + C 1 is:
Weight
2
4
6
Number of vectors
1
14
63
8
10
12
14
100 63
14
1
270
Weight distributions
Note that the code C 1 and its cosets of weight 1 do not contain vectors of weight 2. Hence
all weight 2 vectors in F16
2 are coset leaders of cosets of weight 2. We want to count the
number of cosets of weight 2 whose weight distributions are the two possibilities we have
produced. The cosets whose coset leaders have their support entirely contained in either
{1, . . . , 8} or {9, . . . , 16} have the same distribution as v1 + C 1 has. There are 2( 82 ) = 56
such coset leaders. But each coset contains four coset leaders and thus there are 56/4 = 14
such cosets. The cosets whose coset leaders have a 1 in coordinate {1, . . . , 8} and a 1 in
coordinate {9, . . . , 16} have the same distribution as v2 + C 1 has. There are 82 = 64 such
coset leaders and hence 64 such cosets, as each coset has a unique coset leader.
Exercise 398 Show that the following permutations are automorphisms of the code C 1
defined in Example 7.5.6 where the coordinates are labeled 1, 2, . . . , 16:
(a) P1 = (1, 2)(3, 4),
(b) P2 = (9, 10)(11, 12),
(c) P3 = (2, 3, 5, 4, 7, 8, 6),
(d) P4 = (10, 11, 13, 12, 15, 16, 14), and
(e) P5 = (1, 9)(2, 10)(3, 11)(4, 12)(5, 13)(6, 14)(7, 15)(8, 16).
Exercise 399 Combining Examples 7.5.4 and 7.5.6 and Exercise 397, we have the following table for the weight distributions of the cosets of the code C 1 defined in Example 7.5.6.
Coset
weight 0 1 2 3
0
1
2
2
3
4
Number of vectors of given weight
Number
4 5 6 7
8 9 10 11 12 13 14 15 16 of cosets
1 0 0 0 28 0
0 0 198 0
0
0 28
0
0
0
1
0 0 4 0 8 0 60 0 112 0 60
0 0 1 0 14 0 63 0 100 0 63
0 8
0 14
0
0
4
1
0
0
0
0
0 0 0 0 16 0 64 0
0 16
0
0
0
0
96 0 64
1
16
14
64
Fill in the missing entries using results from Exercises 395 and 396. Then complete the
right-hand column for the cosets of weights 3 and 4.
Exercise 400 In this exercise you are to find the weight distributions of the cosets of the
code C 2 defined in Example 7.5.6. The cosets of weights 1, 3, and 4 have been considered
in Example 7.5.4 and Exercises 395 and 396.
(a) Show that the following permutations are automorphisms of C 2 :
(i) P1 = (1, 2)(3, 4),
(ii) P2 = (1, 3, 5, 7, 9, 11, 13, 15)(2, 4, 6, 8, 10, 12, 14, 16), and
(iii) P3 = (3, 5, 7, 9, 11, 13, 15)(4, 6, 8, 10, 12, 14, 16).
(b) Show that the only two coset leaders of weight 2 that need to be considered are:
(i) v1 = 1100000000000000, and
(ii) v2 = 1010000000000000.
(c) Complete an analogous argument to that of Example 7.5.6 to find the weight distributions of the cosets of weight 2.
(d) For this code, complete a similar table to that in Exercise 399.
271
7.6 Weight distributions of punctured and shortened codes
Exercise 401 Let C be a [24, 12, 8] self-dual doubly-even binary code. In Section 10.1.1
we will see that the only such code is the extended binary Golay code G 24 from Section 1.9.1.
Its weight enumerator is given in Exercise 384.
(a) Show that C has covering radius at most 4. (It turns out that the covering radius is exactly
4; see Theorem 11.1.7.)
(b) Find the weight distributions of the cosets of weight 4 and the cosets of weight 3.
7.6
Weight distributions of punctured and shortened codes
The weight distribution of a code does not in general determine the weight distribution of
a code obtained from it by either puncturing or shortening, but when certain uniformity
conditions hold, the weight distribution of a punctured or shortened code can be determined
from the original code. Let C be an [n, k] code over Fq . Let M be the q k × n matrix
whose rows are all codewords in C, and let Mi be the submatrix of M consisting of
the codewords of weight i. A code is homogeneous provided that for 0 ≤ i ≤ n, each
column of Mi has the same weight. Recall from Section 1.7 that ŴAut(C) is transitive if
ŴAutPr (C) is a transitive permutation group. As Exercise 402 shows, if C has a transitive
automorphism group, C is homogeneous. This is about the only simple way to decide if a
code is homogeneous.
Exercise 402 Prove that if C has a transitive automorphism group, then C is homo
geneous.
Theorem 7.6.1 (Prange) Let C be a homogeneous [n, k, d] code over Fq with d > 1.
Let C ∗ be the code obtained from C by puncturing on some coordinate, and let C ∗ be
the code obtained from C by shortening on some coordinate. Then for 0 ≤ i ≤ n − 1 we
have:
n−i
i +1
Ai (C) +
Ai+1 (C), and
(i) Ai (C ∗ ) =
n
n
n−i
(ii) Ai (C ∗ ) =
Ai (C).
n
Assume further that C is an even binary code. Then for 1 ≤ j ≤ ⌊n/2⌋ we have:
(iii) A2 j (C) = A2 j (C ∗ ) + A2 j−1 (C ∗ ),
n − 2j
2j
A2 j (C) and A2 j (C ∗ ) =
A2 j (C), and
(iv) A2 j−1 (C ∗ ) =
n
n
∗
∗
(v) 2 j A2 j (C ) = (n − 2 j)A2 j−1 (C ).
Proof: The number of nonzero entries in Mi is i Ai (C) = nwi , where wi is the weight of
a column of Mi , since this weight is independent of the column as C is homogeneous.
Thus wi = (i/n)Ai (C), and so each column of Mi has ((n − i)/n)Ai (C) zeros. A vector
of weight i in C ∗ arises either from a vector of weight i in C with a zero in the punctured
coordinate, or a vector of weight i + 1 in C with a nonzero component in the punctured
coordinate; as d > 1 no vector in C ∗ can arise in both ways. Thus (i) holds. A vector of
weight i in C ∗ arises from a vector of weight i in C with a zero on the shortened coordinate,
yielding (ii).
272
Weight distributions
Now assume that C is an even binary code. Vectors of weight 2 j in C give rise to vectors
of weights 2 j or 2 j − 1 in C ∗ ; all vectors of weight 2 j or 2 j − 1 in C ∗ must arise from
codewords of weight 2 j in C. Thus (iii) holds. Since A2 j−1 (C) = A2 j+1 (C) = 0, substituting
i = 2 j − 1 and i = 2 j into (i) gives the two equations in (iv). Part (v) follows by solving
each equation in (iv) for A2 j (C).
Example 7.6.2 Let C be the [12, 6, 6] extended ternary Golay code. As we will show in
Section 10.4.2 ŴAut(C) is transitive. By Exercise 402, C is homogeneous. Let C ∗ be C
punctured on any coordinate. In Example 7.3.2, the weight distribution of C is given. Using
Prange’s Theorem,
WC ∗ (x, y) = y 11 + 132x 5 y 6 + 132x 6 y 5 + 330x 8 y 3 + 110x 9 y 2 + 24x 11 .
The code C ∗ is the [11, 6, 5] ternary Golay code.
Exercise 403 Let C be the [24, 12, 8] extended binary Golay code, whose weight distribution is given in Exercise 384. As we will show in Section 10.1.2, ŴAut(C) is transitive;
and so, by Exercise 402, C is homogeneous. Use Prange’s Theorem to show that
WC ∗ (x, y) = y 23 + 253x 7 y 16 + 506x 8 y 15 + 1288x 11 y 12
+ 1288x 12 y 11 + 506x 15 y 8 + 253x 16 y 7 + x 23 .
The code C ∗ is the [23, 12, 7] binary Golay code.
Exercise 404 Let C be the [24, 12, 8] extended binary Golay code. Let C (i) be the code
obtained from C by puncturing on i points for 1 ≤ i ≤ 4. These are [24 − i, 12, 8 − i] codes
each of which has a transitive automorphism group by the results in Section 10.1.2. The
weight enumerator of C (1) is given in Exercise 403. Find the weight enumerators of C (i) for
2 ≤ i ≤ 4.
Exercise 405 Let C be the [24, 12, 8] extended binary Golay code. Let C (i) be the code
obtained from C by shortening on i points for 1 ≤ i ≤ 4. These are [24 − i, 12 − i, 8] codes
each of which has a transitive automorphism group by the results in Section 10.1.2. Find
the weight enumerators of C (i) for 1 ≤ i ≤ 4.
Exercise 406 Let C be either of the two [16, 8, 4] self-dual doubly-even binary codes with
generator matrices given in Example 7.5.6. By Exercises 398 and 400, ŴAut(C) is transitive
and so C is homogeneous. Use Prange’s Theorem to find the weight distribution of C ∗ .
Prange’s Theorem also holds for homogeneous nonlinear codes of minimum distance
d > 1 as no part of the proof requires the use of linearity. For example, a coset of weight 2
3 is homogeneous. Puncturing this coset
of the [8, 4, 4] extended binary Hamming code H
gives a coset of weight 1 in H3 , and (i) of Prange’s Theorem gives its distribution from
3 ; see Exercise 407. A coset of
the coset distribution of the original weight 2 coset of H
3 ) is triply transitive. Verifying
H3 of weight 1 is not homogeneous even though ŴAut(H
homogeneity of cosets is in general quite difficult. One can show that a coset of the [24, 12, 8]
extended binary Golay code of weight 4 is homogeneous and so obtain the distribution of
a weight 3 coset of the [23, 12, 7] binary Golay code; see Exercise 408.
273
7.7 Other weight enumerators
Exercise 407 Do the following:
3 and any convenient weight 2 vector v in F8 . Show
(a) Choose a generator matrix for H
2
3 is homogeneous.
that the nonlinear code C = v + H
(b) Apply part (i) of Prange’s Theorem to find the weight distribution of C ∗ and compare
your answer to the weight distribution of the weight 1 cosets of H3 found in Exercise 394.
3 of weight 1 is not homogeneous.
(c) Show that a coset of H
Exercise 408 Assuming that a weight 4 coset of the [24, 12, 8] extended binary Golay
code is homogeneous, use Prange’s Theorem and the results of Exercise 401 to find the
weight distribution of a weight 3 coset of the [23, 12, 7] binary Golay code.
7.7
Other weight enumerators
There are other weight enumerators for a code that contain more detailed information about
the codewords. If c ∈ Fqn and α ∈ Fq , let sα (c) be the number of components of c that equal
α. Let Fq = {α0 , α1 , . . . , αq−1 }. The polynomial
WC′ (xα0 , xα1 , . . . , xαq−1 ) =
xαsα (c)
c∈C α∈Fq
is called the complete weight enumerator of C. Notice that if P is a permutation matrix,
then C P and C have the same complete weight enumerator; however, generally C M and C
do not have the same complete weight enumerator if M is a monomial matrix. If q = 2,
then WC′ (x0 , x1 ) is the ordinary weight enumerator WC (x1 , x0 ). MacWilliams [213] also
proved that there is an identity between the complete weight enumerator of C and the
complete weight enumerator of C ⊥ . This identity is the same as (M3 ) if q = 2. If q = 3,
with F3 = {0, 1, 2}, the identity is
WC′ ⊥ (x0 , x1 , x2 ) =
1 ′
W (x0 + x1 + x2 , x0 + ξ x1 + ξ 2 x2 , x0 + ξ 2 x1 + ξ x2 ),
|C| C
(7.12)
where ξ = e2πi/3 is a primitive complex cube root of unity. If q = 4, with F4 = {0, 1, ω, ω}
as in Section 1.1, there are two MacWilliams identities depending on whether one is considering the Hermitian or the ordinary dual; see [291]. For the ordinary dual it is
WC′ ⊥ (x0 , x1 , xω , xω ) =
1 ′
W (x0 + x1 + xω + xω , x0 + x1 − xω − xω ,
|C| C
x0 − x1 − xω + xω , x0 − x1 + xω − xω ).
(7.13)
And for the Hermitian dual, the MacWilliams identity is
WC′ ⊥ H (x0 , x1 , xω , xω ) =
1 ′
W (x0 + x1 + xω + xω , x0 + x1 − xω − xω ,
|C| C
x0 − x1 + xω − xω , x0 − x1 − xω + xω ).
(7.14)
274
Weight distributions
Exercise 409 Recall that the [4, 2] ternary tetracode H3,2 , introduced in Example 1.3.3,
has generator matrix
G=
1
0
0
1
1
1
1
.
2
Find the complete weight enumerator of H3,2 and show that the MacWilliams identity (7.12)
is satisfied; use of a computer algebra system will be most helpful.
Exercise 410 Let C be the [4, 2, 3] extended Reed–Solomon code over F4 with generator
matrix
G=
1 1
1 ω
1
ω
1
.
0
(a) Show that C is self-dual under the ordinary inner product.
(b) Find the complete weight enumerator of C and show that the MacWilliams identity
(7.13) is satisfied; use of a computer algebra system will be most helpful.
Exercise 411
1 0
G = 0 1
0 0
Let C be the [6, 3, 4] hexacode over F4 with generator matrix
0 1 ω ω
0 ω 1 ω ,
1 ω ω 1
as presented in Example 1.3.4. It is Hermitian self-dual. Find the complete weight enumerator of C and show that the MacWilliams identity (7.14) is satisfied; use of a computer
algebra system will be most helpful.
Exercise 412 Recall that denotes conjugation in F4 as described in Section 1.3. Do the
following:
(a) Show that WC′ (x0 , x1 , xω , xω ) = WC′ (x0 , x1 , xω , xω ).
(b) Show that (7.13) holds if and only if (7.14) holds.
Exercise 413 Show that for codes over F4 , WC (x, y) = WC′ (y, x, x, x).
The Lee weight enumerator is sometimes used for codes over fields of characteristic p =
2. For such fields, let Fq = {α0 = 0, α1 , . . . , αr , −α1 , . . . , −αr } where r = (q − 1)/2.
The Lee weight enumerator of C is the polynomial
WC′′ (x0 , xα1 , . . . , xαr ) =
x0s0 (c)
c∈C
r
sα (c)+s−αi (c)
xαi i
.
i=1
The Lee weight enumerator reduces the number of variables in the complete weight enumerator essentially by half. If q = 3, then WC (x1 , x0 ) = WC′′ (x0 , x1 ). There is also a relationship
between the Lee weight enumerator of C and the Lee weight enumerator of C ⊥ , which is
obtained by equating xαi and x−αi in the MacWilliams identity for the complete weight
enumerator.
MacWilliams identities are established for other families of codes and other types of
weight enumerators in [291].
275
7.8 Constraints on weights
7.8
Constraints on weights
In this section, we will present constraints on the weights of codewords in even binary
codes, where we recall that a binary code is called even if all of its codewords have even
weight. The following identity from Theorem 1.4.3 for two vectors x, y ∈ Fn2 will prove
useful:
wt(x + y) = wt(x) + wt(y) − 2wt(x ∩ y),
(7.15)
where x ∩ y is the vector in Fn2 which has 1s precisely in those positions where both x and
y have 1s. When the ordinary inner product between two vectors x and y is 0, we say the
vectors are orthogonal and denote this by x⊥y.
The following is an easy consequence of (7.15).
Lemma 7.8.1 Let x and y be two vectors in Fn2 .
(i) Suppose wt(x) ≡ 0 (mod 4). Then wt(x + y) ≡ wt(y) (mod 4) if and only if x⊥y, and
wt(x + y) ≡ wt(y) + 2 (mod 4) if and only if x⊥
/ y.
(ii) Suppose wt(x) ≡ 2 (mod 4). Then wt(x + y) ≡ wt(y) (mod 4) if and only if x⊥
/ y, and
wt(x + y) ≡ wt(y) + 2 (mod 4) if and only if x⊥y.
Exercise 414 Prove Lemma 7.8.1.
We say that a binary vector is doubly-even if its weight is divisible by 4. A binary vector
is singly-even if its weight is even but not divisible by 4. Recall from Theorem 1.4.8 that
if a binary code has only doubly-even vectors, the code is self-orthogonal. We will be
primarily interested in codes where not all codewords are doubly-even. We will present a
decomposition of even binary codes into pieces that are called the hull of the code, H-planes,
and A-planes. An H-plane or A-plane is a 2-dimensional space spanned by x and y; such a
space is denoted span{x, y}. The hull, H-plane, and A-plane are defined as follows:
r The hull of a code C is the subcode H = C ∩ C ⊥ .
r If x and y are doubly-even but not orthogonal, then we call span{x, y} an H-plane.
Note that if span{x, y} is an H-plane, then wt(x + y) ≡ 2 (mod 4) by Lemma 7.8.1(i),
and so an H-plane contains two nonzero doubly-even vectors and one singly-even
vector.
r If x and y are singly-even but not orthogonal, then we call span{x, y} an A-plane. If
span{x, y} is an A-plane, then wt(x + y) ≡ 2 (mod 4) still holds by Lemma 7.8.1(ii); an
A-plane therefore contains three singly-even vectors.
Exercise 415 Prove that if C is an [n, k] even binary code, then either C is self-orthogonal
or its hull has dimension at most k − 2.
Exercise 416 Let C be a 2-dimensional even binary code that is not self-orthogonal. Prove
the following:
(a) The hull of C is {0}.
(b) Either C contains three singly-even vectors, in which case it is an A-plane, or C contains
exactly two doubly-even nonzero vectors, in which case it is an H-plane.
276
Weight distributions
Before proceeding with our decomposition, we need a few preliminary results.
Lemma 7.8.2 Let C be an even binary code of dimension k ≥ 2 whose hull is {0}. The
following hold:
(i) If C contains a nonzero doubly-even vector, it contains an H-plane.
(ii) If C contains no nonzero doubly-even vectors, it is 2-dimensional and is an A-plane.
Proof: Suppose that C contains a nonzero doubly-even vector x. Then as x ∈ C ⊥ , there
is a codeword y ∈ C with x⊥
/ y. If y is doubly-even, then span{x, y} is an H-plane. If y is
singly-even, then x + y is doubly-even by Lemma 7.8.1 and span{x, x + y} = span{x, y} is
still an H-plane. This proves (i). Part (ii) follows from Exercise 417.
Exercise 417 Let C be an even binary code with dimension k ≥ 2 whose hull is {0}. Prove
that if C has no nonzero doubly-even vectors, then k = 2 and C is an A-plane.
To describe our decomposition we need one more piece of notation. Suppose that C
contains two subcodes, A and B, such that C = A + B = {a + b | a ∈ A and b ∈ B}, where
A ∩ B = {0} and a · b = 0 for all a ∈ A and b ∈ B. Then we say that C is the orthogonal
sum of A and B, and denote this by C = A⊥B. (In vector space terminology, C is the
orthogonal direct sum of A and B.)
Exercise 418 Let C = A1 ⊥A2 , where A1 and A2 are A-planes. Prove that C = H1 ⊥H2 ,
where H1 and H2 are H-planes.
Lemma 7.8.3 Let C be an even binary code whose hull is {0}. Then C has even dimension
2m and one of the following occurs:
(i) C = H1 ⊥H2 ⊥ · · · ⊥Hm , where H1 , H2 , . . . , Hm are H-planes.
(ii) C = H1 ⊥H2 ⊥ · · · ⊥Hm−1 ⊥A, where H1 , H2 , . . . , Hm−1 are H-planes and A is an
A-plane.
Proof: This is proved by induction on the dimension of C. C cannot have dimension 1 as,
otherwise, its nonzero vector, being even, is orthogonal to itself and hence in the hull. So
C has dimension at least 2. If C has dimension 2, it is either an H-plane or an A-plane by
Exercise 416. So we may assume that C has dimension at least 3. By Lemma 7.8.2, C contains
an H-plane H1 . The proof is complete by induction if we can show that C = H1 ⊥C 1 , where
the hull of C 1 is {0}. Let H1 = span{x, y} where x and y are nonzero doubly-even vectors
with x⊥
/ y. Define f : C → F22 , where f (c) = (c · x, c · y). By Exercise 419, f is a surjective
/ y and x⊥
/ (x + y),
linear transformation. Let C 1 be its kernel; C 1 has dimension k − 2. As x⊥
none of x, y, or x + y are in C 1 . Thus H1 ∩ C 1 = {0}. By the definition of f , H1 is orthogonal
to C 1 . So C = H1 ⊥C 1 . If z = 0 is in the hull of C 1 , it is orthogonal to C 1 and to H1 ; hence
z is in the hull of C, a contradiction.
Exercise 419 Let C be a binary code with two nonzero even vectors x and y, where
x⊥
/ y. Define f : C → F22 , where f (c) = (c · x, c · y). Show that f is a surjective linear
transformation.
The following is the converse of Lemma 7.8.3; we will need this result in Chapter 9.
277
7.8 Constraints on weights
Lemma 7.8.4 If C = P 1 ⊥ · · · ⊥P k is an orthogonal sum where each P i is either an
H-plane or an A-plane, then the hull of C is {0}.
Proof: If x ∈ C ∩ C ⊥ , where x = x1 + · · · + xk with xi ∈ P i for 1 ≤ i ≤ k, then x · y = 0
for all y ∈ C. As P i ∩ P i⊥ = {0}, if xi = 0, there exists yi ∈ P i with xi · yi = 0. Thus
0 = x · yi ; but x · yi = xi · yi = 0 as x j ⊥yi if j = i. This contradiction shows xi = 0. So
C ∩ C ⊥ = {0}.
We now state our main decomposition result, found in [3].
Theorem 7.8.5 Let C be an [n, k] even binary code whose hull H has dimension r . Then
k = 2m + r for some nonnegative integer m and C has one of the following forms:
(H) C = H⊥H1 ⊥H2 ⊥ · · · ⊥Hm , where H1 , H2 , . . . , Hm are H-planes and H is doublyeven.
(O) C = H⊥H1 ⊥H2 ⊥ · · · ⊥Hm , where H1 , H2 , . . . , Hm are H-planes and H is singlyeven.
(A) C = H⊥H1 ⊥H2 ⊥ · · · ⊥Hm−1 ⊥A, where H1 , H2 , . . . , Hm−1 are H-planes, A is an
A-plane, and H is doubly-even.
Proof: If C = C ⊥ , we have either form (H) or form (O) with m = 0. Assume C = C ⊥ .
Let x1 , x2 , . . . , xr be a basis of H; extend this to a basis x1 , x2 , . . . , xn of C. Let C 1 =
span{xr +1 , xr +2 , . . . , xn }. Clearly, C = H⊥C 1 as every vector in H is orthogonal to every
vector in C. If z is in the hull of C 1 , it is orthogonal to everything in C 1 and in H, and
hence to everything in C. Thus z is also in H, implying that z = 0. So the hull of C 1 is
{0}. By Lemma 7.8.3, the result follows once we show that the decomposition given in (A)
with H singly-even reduces to (O). In that case, let A = span{x, y} and let z be a singlyeven vector in H. By Lemma 7.8.1, x1 = z + x and y1 = z + y are both doubly-even. As
x⊥
/ y, x1 ⊥/ y1 ; hence Hm = span{x1 , y1 } is an H-plane. Clearly, H⊥H1 ⊥ · · · ⊥Hm−1 ⊥A =
H⊥H1 ⊥ · · · ⊥Hm and we have form (O).
Exercise 420 If C is a singly-even self-orthogonal binary code, what is its form in
Theorem 7.8.5?
As a consequence of this theorem, we can obtain constraints on the number of singly-even
and doubly-even codewords in an even binary code; see [31, 164, 271].
Theorem 7.8.6 Let C be an [n, k] even binary code whose hull has dimension r . Then
k = 2m + r . Let a be the number of doubly-even vectors in C and b the number of singlyeven vectors in C.
(i) If C has form (H), then a = 2r (22m−1 + 2m−1 ) and b = 2r (22m−1 − 2m−1 ).
(ii) If C has form (O), then a = b = 2k−1 .
(iii) If C has form (A), then a = 2r (22m−1 − 2m−1 ) and b = 2r (22m−1 + 2m−1 ).
Proof: We leave the proofs of (i) and (ii) as exercises. Suppose C has form (A). Let C 1 =
H1 ⊥H2 ⊥ · · · ⊥Hm−1 ⊥A. As H is doubly-even, by Lemma 7.8.1, a, respectively b, is
2r times the number of doubly-even, respectively singly-even, vectors in C 1 . Therefore
we only need look at C 1 where we prove the result by induction on m. When m = 1,
278
Weight distributions
C 1 = A and the result follows as A contains three singly-even vectors and one doublyeven vector. Suppose that the number of doubly-even vectors in H1 ⊥H2 ⊥ · · · ⊥Hi−1 ⊥A
for some i ≥ 1 is 22i−1 − 2i−1 and the number of singly-even vectors is 22i−1 + 2i−1 . As
Hi has one singly-even and three doubly-even vectors, the number of doubly-even vectors
in H1 ⊥H2 ⊥ · · · ⊥Hi ⊥A is 22i−1 + 2i−1 + 3(22i−1 − 2i−1 ) = 22i+1 − 2i by Lemma 7.8.1.
Similarly, the number of singly-even vectors in H1 ⊥H2 ⊥ · · · ⊥Hi ⊥A is 3(22i−1 + 2i−1 ) +
22i−1 − 2i−1 = 22i+1 + 2i .
Exercise 421 Prove that a and b are as claimed in forms (H) and (O) of Theorem 7.8.6.
Exercise 422 Find the form ((H), (O), or (A)) from Theorem 7.8.5, the corresponding
parameters a, b, r , and m from Theorems 7.8.5 and 7.8.6, and the hull of each of the [8, 4]
even binary codes with the following generator matrices:
1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
(a) G 1 =
0 0 1 0 1 1 0 1 ,
0
1
0
(b) G 2 =
0
0
1
0
(c) G 3 =
0
0
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
0
1
0
1
0
0
0
1
0
0
0
1
1
1
0
1
1
0
1
1
1
1
0
1
0
0
1
0
0
0
1
1
0
0
0
1
0
0
0
0
1
0
,
0
0
1
0
.
0
0
Example 7.8.7 The MacWilliams equations, or their equivalent forms, can be used to
determine whether or not a set of nonnegative integers could arise as the weight distribution
of a code. If a set of integers were to be the weight distribution of a code, then the distribution
of the dual code can be computed from one of the forms of the MacWilliams equations.
The resulting dual distribution must obviously be a set of nonnegative integers; if not the
original set of integers could not be the weight distribution of a code. Theorem 7.8.6 can be
used to eliminate weight distributions that the MacWilliams equations cannot. For example,
suppose that C is an [8, 4] binary code with weight distribution A0 = A2 = A6 = A8 = 1
and A4 = 12. Computing Ai⊥ , one finds that Ai⊥ = Ai , and so the MacWilliams equations do
not eliminate this possibility. However, by using Theorem 7.8.6, this possibility is eliminated
as Exercise 423 shows.
Exercise 423 Referring to Example 7.8.7, show that there is no [8, 4] binary code with
weight distribution A0 = A2 = A6 = A8 = 1 and A4 = 12.
Corollary 7.8.8 If C is an [n, k] even binary code with 2k−1 doubly-even and 2k−1 singlyeven codewords, then the hull of C is singly-even.
Exercise 424 Prove Corollary 7.8.8.
279
7.9 Weight preserving transformations
If C is a binary code with odd weight vectors, there are constraints on the number of
vectors of any weight equivalent to a fixed number modulo 4; see [272].
7.9
Weight preserving transformations
As codes are vector spaces, it is natural to study the linear transformations that map one
code to another. An arbitrary linear transformation may send a codeword of one weight
to a codeword of a different weight. As we have seen throughout this book, the weights
of codewords play a significant role in the study of the codes and in their error-correcting
capability. Thus if we are to consider two codes to be “the same,” we would want to find
a linear transformation between the two codes that preserves weights; such a map is called
a weight preserving linear transformation. In this section, we will prove that such linear
transformations must be monomial maps. This result, first proved by MacWilliams in her
Ph.D. thesis [212], is the basis for our definition of equivalent codes in Chapter 1. The proof
we present is found in [24]; a proof using abelian group characters can be found in [347].
Let C be an [n, k] code over Fq . Define
µ(k) =
qk − 1
.
q −1
By Exercise 425, the vector space Fqk has µ(k) 1-dimensional subspaces. Let
V1 , V2 , . . . , Vµ(k) be these subspaces, and let vi be a nonzero vector in Vi ; so {vi } is a
basis of Vi . The nonzero columns of a generator matrix for C are scalar multiples of some of
the vi s (of course, viewed as column vectors). Recall from Section 7.8 that Vi is orthogonal
to V j , denoted Vi ⊥V j , provided Vi and V j are orthogonal under the ordinary inner product;
this is clearly equivalent to vi · v j = 0. Now define a µ(k) × µ(k) matrix A = [ai, j ] that
describes the orthogonality relationship between the 1-dimensional spaces:
0 if Vi ⊥V j ,
ai, j =
/ Vj .
1 if Vi ⊥
Exercise 425 Prove that there are (q k − 1)/(q − 1) 1-dimensional subspaces of Fqk .
We will first show that A is invertible over the rational numbers Q. In order to do this,
we need the following lemma.
Lemma 7.9.1 Let A be the matrix described above.
(i) The sum of the rows of A is the vector x = x1 · · · xµ(k) , with xi = µ(k) − µ(k − 1) for
all i.
( j)
( j)
(ii) For 1 ≤ j ≤ µ(k), let y( j) = y1 · · · yµ(k) be the sum of all of the rows of A having 0 in
( j)
( j)
column j. Then y j = 0 and yi = µ(k − 1) − µ(k − 2) for all i = j.
Proof: To prove (i) it suffices to show that each column of A has µ(k) − µ(k − 1) entries that
equal 1; this is equivalent to showing that, for 1 ≤ i ≤ µ(k), there are µ(k − 1) subspaces
V j orthogonal to Vi . Let f i : Fqk → Fq , where f i (u) = vi · u. By Exercise 426(a), f i is a
surjective linear transformation. Therefore the dimension of the kernel of f i is k − 1. Thus
280
Weight distributions
by Exercise 425, there are precisely µ(k − 1) of the V j in the kernel of f i proving that there
are µ(k − 1) subspaces V j orthogonal to Vi .
By (i), there are µ(k − 1) rows of A that have 0 in column j. For part (ii) it therefore suffices to show that, for each i = j, there are µ(k − 2) 1-dimensional subspaces Vm
with Vm ⊥Vi and Vm ⊥V j . Let f i, j : Fqk → Fq2 , where f i, j (u) = (vi · u, v j · u). By Exercise 426(b), f i, j is a surjective linear transformation whose kernel has dimension k − 2. By
Exercise 425, there are precisely µ(k − 2) of the Vm in the kernel of f i, j , as required.
Exercise 426 Let v and w be (nonzero) independent vectors in Fqk .
(a) Let f : Fqk → Fq , where f (u) = v · u. Show that f is a surjective linear transformation.
(b) Let g : Fqk → Fq2 , where g(u) = (v · u, w · u). Show that g is a surjective linear transformation.
(See Exercise 419 for a comparable result.)
Theorem 7.9.2 The matrix A is invertible over Q.
Proof: It suffices to show that A has rank µ(k) over Q. This is accomplished if we show
that by taking some linear combination of rows of A we can obtain all of the standard
basis vectors e( j) , where e( j) has a 1 in position j and 0 elsewhere. But in the notation of
Lemma 7.9.1,
e( j) =
1
1
x−
y( j) .
µ(k) − µ(k − 1)
µ(k − 1) − µ(k − 2)
Fix a generator matrix G of C. Denote the columns of G by col(G). Define the linear
transformation f : Fqk → C by f (u) = uG. (This is the standard way of encoding the message u using G as described in Section 1.11.) We define a vector w = w1 · · · wµ(k) with
integer entries:
wi = |{c ∈ col(G) | c = 0, cT ∈ Vi }|.
Thus wi tells how many nonzero columns of G are in Vi , that is, how many columns of
µ(k)
G are nonzero scalar multiples of vi . So there are n − i=1 wi zero columns of G. By
knowing w we can reproduce G up to permutation and scaling of columns. Notice that as
the vectors v1 , . . . , vµ(k) consist of all the basis vectors of the 1-dimensional subspaces of
Fqk , then f (v1 ), . . . , f (vµ(k) ) consist of all the basis vectors of the 1-dimensional subspaces
of C, as G has rank k and C is k-dimensional. The weights of these codewords f (vi ) are
determined by AwT .
Theorem 7.9.3 For 1 ≤ i ≤ µ(k), (AwT )i = wt( f (vi )).
Proof: We have
µ(k)
(AwT )i =
j=1
ai, j w j =
wj,
j∈Si
281
7.9 Weight preserving transformations
where Si = { j | V j ⊥
/ Vi }. By definition of w j ,
(AwT )i =
j∈Si
|{c ∈ col(G) | c = 0, cT ∈ V j }|
µ(k)
=
j=1
|{c ∈ col(G) | cT ∈ V j and vi · cT = 0}|,
/ V j . But
because cT ∈ V j and vi · cT = 0 if and only if c = 0, cT ∈ V j , and Vi ⊥
wt( f (vi )) = wt(vi G) = |{c ∈ col(G) | vi · cT = 0}|
µ(k)
=
j=1
|{c ∈ col(G) | cT ∈ V j and vi · cT = 0}|
as the transpose of every column of G is in some V j . The result follows.
This result shows that the weight of every nonzero codeword, up to scalar multiplication,
is listed as some component of AwT . Now let C ′ be another [n, k] code over Fq such that there
is a surjective linear transformation φ : C → C ′ that preserves weights, that is, wt(φ(c)) =
wt(c) for all c ∈ C. Let g : Fqk → C ′ be the composition φ ◦ f so that g(u) = φ(uG) for all
u ∈ Fqk . As φ and f are linear, so is g. If we let G ′ be the matrix for g relative to the standard
bases for Fqk and Fqn , we have g(u) = uG ′ and G ′ is a generator matrix for C ′ ; also g is the
standard encoding function for the message u using G ′ . We can now prove our main result.
Theorem 7.9.4 There is a weight preserving linear transformation between [n, k] codes C
and C ′ over Fq if and only if C and C ′ are monomially equivalent. Furthermore, the linear
transformation agrees with the associated monomial transformation on every codeword
in C.
Proof: If C and C ′ are monomially equivalent, the linear transformation determined by the
monomial map preserves weights. For the converse, we use the notation developed up to
this point. We first note that for 1 ≤ i ≤ µ(k),
wt(g(vi )) = wt(φ( f (vi ))) = wt( f (vi ))
(7.16)
′
be defined analogously to w by
as φ is weight preserving. Let w′ = w1′ · · · wµ(k)
wi′ = |{c ∈ col(G ′ ) | c = 0, cT ∈ Vi }|.
By Theorem 7.9.3 used twice and (7.16),
T
(AwT )i = wt( f (vi )) = wt(g(vi )) = (Aw′ )i .
Thus AwT = Aw′ T and hence w = w′ by Theorem 7.9.2. Since w and w′ list the number
of nonzero columns of G and G ′ , respectively, that are in each Vi , the columns of G ′
are simply rearrangements of the columns of G upon possible rescaling. Thus there is a
monomial matrix M with G ′ = G M, and C and C ′ are therefore monomially equivalent.
Every codeword of C can be written as uG for some message u ∈ Fqk . But φ(uG) = uG ′ .
Since G ′ = G M, we have φ(uG) = uG M and so the linear transformation φ and the
monomial map M agree on every codeword of C.
282
Weight distributions
A consequence of the proof of Theorem 7.9.4 is the following result on constant weight
codes due to Bonisoli [25]; the idea of this proof comes from Ward [347]. A linear code is
constant weight if all nonzero codewords have the same weight. Examples of such codes
are the simplex codes over Fq by Theorem 2.7.5.
Theorem 7.9.5 Let C be an [n, k] linear code over Fq with all nonzero codewords of the
same weight. Assume that C is nonzero and no column of a generator matrix is identically
zero. Then C is equivalent to the r -fold replication of a simplex code.
Proof: We use the notation of this section. Let G be a generator matrix for C. Choose
a column cT of G such that this column or any scalar multiple of this column occur the
maximum number of times r as columns of G. Let x be an arbitrary nonzero element of
Fqk . Then there exists a nonsingular matrix B such that BcT = xT . Thus BG is a k × n
matrix that contains the column xT , or scalar multiples of this column, at least r times.
Note that BG = G ′ is another generator matrix for C, and so B induces a nonsingular
linear transformation of C. As all nonzero codewords of C have the same weight, this linear
transformation is weight preserving. By the proof of Theorem 7.9.4, there is a monomial
matrix M such that G ′ = G M. As G ′ contains the column xT or nonzero scalar multiples of
this column at least r times and G ′ = G M is merely the same matrix as G with its columns
rearranged and rescaled, G has r or more columns that are nonzero scalar multiples of xT .
By maximality of r and the fact that x is arbitrary, every nonzero vector in Fqk together with
its nonzero scalar multiples occurs exactly r times as columns of G. But this means that C
is the r -fold replication of the simplex code.
Example 7.9.6 Let C be the constant weight binary code with generator matrix
0 0 0 1 1 1 1
G = 0 1 1 0 0 1 1 ;
1 0 1 0 1 0 1
this is the simplex code S 3 of Section 1.8. The subcode generated by the second and third
vector in the above matrix is also constant weight. Puncturing on its zero coordinate, we
obtain the constant weight code C 1 with generator matrix
G1 =
0
1
1
0
1
1
0
1
1
0
1
.
1
This is obviously the 2-fold replication of the simplex code S 2 , which has generator matrix
G2 =
7.10
0
1
1
0
1
.
1
Generalized Hamming weights
There is an increasing sequence of positive integers associated with a code called generalized Hamming weights which in many ways behave both like minimum weights of codes
and weight distributions. This sequence, which includes the minimum weight of the code,
283
7.10 Generalized Hamming weights
satisfies both Generalized Singleton and Generalized Griesmer Bounds. As with weight distributions, the generalized Hamming weights for the dual code can be determined from those
of the original code; furthermore, the generalized Hamming weights satisfy MacWilliams
type equations.
Let C be an [n, k, d] code over Fq . If D is a subcode of C, the support supp(D) of D is
the set of coordinates where not all codewords of D are zero. So |supp(D)| is the number of
nonzero columns in a generator matrix for D. For 1 ≤ r ≤ k, the r th-generalized Hamming
weight of C, which is also called the r th-minimum support weight, is
dr (C) = dr = min{|supp(D)| | D is an [n, r ] subcode of C}.
The set {d1 (C), d2 (C), . . . , dk (C)} is called the weight hierarchy of C. The generalized Hamming weights were introduced in 1977 by Helleseth, Kløve, and Mykkeltveit [125]. Victor
Wei [348] studied them in connection with wire-tap channels of type II; see Section 5 of
[329]. They are also important in the study of the trellis structure of codes as described in
Section 5 of [336].
The following theorem contains basic information about generalized Hamming weights.
Theorem 7.10.1 Let C be an [n, k, d] code over Fq . The following hold:
(i) d = d1 (C) < d2 (C) < · · · < dk (C) ≤ n.
(ii) If M is an n × n monomial map and γ is an automorphism of Fq , then the weight
hierarchy of C is the same as the weight hierarchy of C Mγ .
Proof: Suppose that D is an [n, r ] subcode of C with dr (C) = |supp(D)|. Choose one of the
dr (C) nonzero columns of a generator matrix G r of D and perform row operations on G r ,
obtaining a new generator matrix G r′ of D, so that this column has a single nonzero entry in
the first row. The subcode of D generated by rows 2 through r of G r′ is an [n, r − 1] code
with support size strictly less than dr (C). Thus dr −1 (C) < dr (C). By choosing a vector in C of
minimum weight d to generate an [n, 1] subcode of C, we see that d = d1 (C); thus (i) follows.
In (ii), since applying Mγ rescales columns, permutes coordinates, and acts component
wise by γ , |supp(D)| = |supp(DMγ )|. Part (ii) follows.
Example 7.10.2
0 0 0
0 0 0
G=
0 1 1
1 0 1
The [15, 4, 8] binary simplex code C has generator matrix
0 0 0 0 1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0 1 1 1 1
.
0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1
By Theorem 2.7.5, all nonzero codewords have weight 8. The generalized Hamming weights
are d1 = d = 8, d2 = 12, d3 = 14, and d4 = 15. To compute these values, first observe that
because the sum of two distinct weight 8 codewords has weight 8, two weight 8 codewords
have exactly four 1s in common positions. Therefore the support of a 2-dimensional subcode
must be of size at least 12, and the first two rows of G generate a 2-dimensional code with
support size 12. As G has no zero columns, d4 = 15, and so by Theorem 7.10.1 d3 = 13
or 14. Suppose that D is a 3-dimensional subcode with support size 13. Let M be a matrix
whose rows are the codewords of D. In any given nonzero column, by Exercise 373, exactly
284
Weight distributions
half the rows of M have a 1 in that column. So M contains 13 · 4 = 52 1s; but M has
seven rows of weight 8, implying that M has 56 1s, a contradiction. So d3 = 14.
There is a relationship between the weight hierarchy of C and the weight hierarchy of
C ⊥ due to Wei [348]. First we prove a lemma.
Lemma 7.10.3 Let C be an [n, k] code over Fq . For a positive integer s < n, let r be the
largest integer such that C has a generator matrix G of the form
G=
G1
G2
O
,
G3
where G 1 is an r × s matrix of rank r and G 3 is a (k − r ) × (n − s) matrix. Then C has a
parity check matrix of the form
H=
H1
O
H2
,
H3
where H1 is a (s − r ) × s matrix of rank s − r and H3 is an (n − k − s + r ) × (n − s)
matrix of rank n − k − s + r . Furthermore, n − k − s + r is the largest dimension of a
subspace of C ⊥ with support contained in the last n − s coordinates.
Proof: Let H = [A B], where A is an (n − k) × s matrix and B is an (n − k) × (n − s)
matrix. Since the rows of H are orthogonal to the rows of G, the rows of A are orthogonal
to the rows of G 1 . Hence A has rank at most s − r . Suppose that the rank of A is less than
s − r . Then there is a vector of length s not in the code generated by G 1 that is orthogonal
to the rows of A. Appending n − s zeros to the end of this vector, we obtain a vector
orthogonal to the rows of H , which therefore is in C but is not in the code generated by
[G 1 O], contradicting the maximality of r . Thus the rank of A equals s − r and hence C
has a parity check matrix H satisfying the conclusions of the lemma.
Theorem 7.10.4 Let C be an [n, k] code over Fq . Then
{dr (C) | 1 ≤ r ≤ k} = {1, 2, . . . , n} \ {n + 1 − dr (C ⊥ ) | 1 ≤ r ≤ n − k}.
Proof: Let s = dr (C ⊥ ) for some r with 1 ≤ r ≤ n − k. It suffices to show that there does
not exist a t with 1 ≤ t ≤ k such that dt (C) = n + 1 − s.
As s = dr (C ⊥ ), there is a set of s coordinates that supports an r -dimensional subcode of
C ⊥ . Reorder columns so that these are the first s coordinates. Thus C ⊥ has an (n − k) × n
generator matrix of the form
H=
H1
H2
O
,
H3
where H1 is an r × s matrix of rank r . As dr (C ⊥ ) < dr +1 (C ⊥ ) by Theorem 7.10.1, no larger
subcode has support in these s coordinates. Applying Lemma 7.10.3 with C ⊥ in place of C
and n − k in place of k, there is a (k − s + r )-dimensional subcode of C which is zero on
the first s coordinate positions. Hence
dk−s+r (C) ≤ n − s.
(7.17)
285
7.10 Generalized Hamming weights
Assume to the contrary that there is a t with dt (C) = n + 1 − s. It follows from (7.17)
that
t > k − s + r.
(7.18)
Replacing C by a permutation equivalent code, if necessary, C has a k × n generator matrix
G=
G1
G2
O
,
G3
where G 1 is a t × (n + 1 − s) matrix of rank t. Again as dt (C) < dt+1 (C) Lemma 7.10.3
applies with t in place of r and n + 1 − s in place of s. So there is an (s − 1 − k + t)dimensional subcode of C ⊥ that is zero on the first n + 1 − s positions. Since s = dr (C ⊥ ),
we have s − 1 − k + t < r and this contradicts (7.18).
We now illustrate how Theorem 7.10.4 can be used to compute the weight hierarchy of
self-dual codes.
Example 7.10.5 Let C be a [24, 12, 8] self-dual binary code. (This must be the extended binary Golay code.) We compute {dr (C) | 1 ≤ r ≤ 12} using the self-duality of
/ {dr (C) | 1 ≤ r ≤ 12} by
C and Theorem 7.10.4. Since d1 (C) = 8, we have 1, 2, . . . , 7 ∈
/ {dr (C ⊥ ) | 1 ≤
Theorem 7.10.1, and thus 24, 23, . . . , 18 ∈ {dr (C ⊥ ) | 1 ≤ r ≤ 12} and 17 ∈
r ≤ 12}. It is easy to see that a 2-dimensional self-orthogonal binary code of minimum
weight 8 cannot exist for length less than 12. Therefore d2 (C) ≥ 12 and 9, 10, 11 ∈
/
⊥
⊥
{dr (C) | 1 ≤ r ≤ 12}. Thus 16, 15, 14 ∈ {dr (C ) | 1 ≤ r ≤ 12}. As C = C , we now conclude that {8, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24} is contained in {dr (C) | 1 ≤ r ≤ 12}
and 12 ≤ d2 (C) ≤ 13. An easy argument shows that a [13, 2, 8] self-orthogonal binary code
must have a column of zeros in its generator matrix. Hence d2 (C) = 12. So we have:
r
1
2
3
4
5
6
7
8
9
10
11
12
dr (C)
8
12
14
15
16
18
19
20
21
22
23
24
Exercise 427 Do the following:
(a) Show that a 2-dimensional self-orthogonal binary code of minimum weight 8 cannot
exist for length less than 12.
(b) Show that a [13, 2, 8] self-orthogonal binary code must have a column of zeros in its
generator matrix.
Exercise 428 Find the weight hierarchy of the [16, 8, 4] self-dual doubly-even binary code
with generator matrix G 1 of Example 7.5.6.
Exercise 429 Find the weight hierarchy of the [16, 8, 4] self-dual doubly-even binary code
with generator matrix G 2 of Example 7.5.6.
Exercise 430 What are the weight hierarchies of the [7, 3, 4] binary simplex code and the
[7, 4, 3] binary Hamming code?
Wei [348] observed that the minimum support weights generalize the Singleton Bound.
286
Weight distributions
Theorem 7.10.6 (Generalized Singleton Bound) For an [n, k, d] linear code over Fq ,
dr ≤ n − k + r for 1 ≤ r ≤ k.
Proof: The proof follows by induction on k − r . When k − r = 0, dr = dk ≤ n =
n − k + r by Theorem 7.10.1. Assuming dr ≤ n − k + r for some r ≤ k, then by the same
theorem, dr −1 ≤ dr − 1 ≤ n − k + (r − 1), yielding the result.
The Singleton Bound is the case r = 1 of the Generalized Singleton Bound. It follows
from the proof of the Generalized Singleton Bound that if a code meets the Singleton
Bound, then dr −1 = dr − 1 for 1 < r ≤ k and dk = n. Codes meeting the Singleton Bound
are MDS codes by definition; so MDS codes also meet the Generalized Singleton Bound
and thus their generalized Hamming weights are determined.
Theorem 7.10.7 Let C be an MDS code over Fq . Then:
(i) C meets the Generalized Singleton Bound for all r with 1 ≤ r ≤ k, and
(ii) dr = d + r − 1 for 1 ≤ r ≤ k.
We now investigate a generalization of the Griesmer Bound [124, 128] that the minimum
support weights satisfy. Let G be a generator matrix for a code C. Let col(G) be the set of
distinct columns of G. If x ∈ col(G), define m(x) to be the multiplicity of x in G, and if
U ⊆ col(G), define
m(U ) =
m(x).
x∈U
For U ⊆ col(G), let span{U } be the subspace of Fqk spanned by U . Finally, for 1 ≤ r ≤ k,
let F k,r be the set of all r -dimensional subspaces of Fqk spanned by columns of G. Before
stating the Generalized Griesmer Bound, we need two lemmas. The first lemma gives a
formula for dr (C) in terms of the function m(U ).
Lemma 7.10.8 Let C be an [n, k] code with generator matrix G. For 1 ≤ r ≤ k,
dr (C) = n − max{m(U ) | U ⊆ col(G) and span{U } ∈ F k,k−r }.
Proof: Let D be an [n, r ] subcode of C. Then there exists an r × k matrix A of rank r such
that AG is a generator matrix of D. Moreover, for each such matrix A, AG is a generator
matrix of an [n, r ] subcode of C. By definition, |supp(D)| = n − m, where m is the number
of zero columns of AG. Hence |supp(D)| = n − m(U ), where
U = {y ∈ col(G) | Ay = 0}.
Since the rank of A is r and the rank of G is k, span{U } is in F k,k−r . Conversely, if
U ⊆ col(G) where span{U } is in F k,k−r , there is an r × k matrix A of rank r such that
Ay = 0 if and only if y ∈ span{U }. But AG is a generator matrix for an [n, r ] subcode D
of C. Thus as dr (C) is the minimum support size of any such D, the result follows.
Using this lemma and its proof, we can relate dr (C) and dr −1 (C).
Lemma 7.10.9 Let C be an [n, k] code over Fq . Then for 1 < r ≤ k,
(q r − 1)dr −1 (C) ≤ (q r − q)dr (C).
287
7.10 Generalized Hamming weights
Proof: Let G be a generator matrix for C. By Lemma 7.10.8 and its proof, there exists an
r × k matrix A of rank r such that U = {y ∈ col(G) | Ay = 0} where span{U } ∈ F k,k−r
and dr (C) = n − m(U ). Furthermore, AG generates an r -dimensional subcode D of C. By
Exercise 431 there are t = (q r − 1)/(q − 1) (r − 1)-dimensional subcodes V1 , V2 , . . . , Vt
of D. To each Vi is associated an (r − 1) × k matrix Ai of rank r − 1 such that Ai G is
a generator matrix of Vi . Letting Ui = {y ∈ col(G) | Ai y = 0}, we see that span{Ui } ∈
F k,k−r +1 as in the proof of Lemma 7.10.8. Also U ⊆ Ui , as a column of AG is 0 implying
that the same column of Ai G is 0 because Vi is a subcode of D. Conversely, if U ′ ⊆ col(G)
where U ⊆ U ′ and span{U ′ } ∈ F k,k−r +1 , there is an (r − 1) × k matrix A′ of rank r − 1
such that U ′ = {y ∈ col(G) | A′ y = 0}. As U ⊆ U ′ , D = span{U } ⊆ span{U ′ }. So A′ G
generates an (r − 1)-dimensional subspace, say Vi , of D and hence U ′ = Ui .
By Lemma 7.10.8,
dr −1 (C) ≤ n − m(Ui )
for 1 ≤ i ≤ t,
and thus
dr (C) − dr −1 (C) ≥ n − m(U ) − [n − m(Ui )] = m(Ui \ U )
for 1 ≤ i ≤ t.
Every column of G not in U is in exactly one Ui because U together with this column, plus
possibly other columns, spans a subspace of F k,k−r +1 by the previous paragraph. Therefore
t
t[dr (C) − dr −1 (C)] ≥
i=1
and the lemma follows.
m(Ui \ U ) = n − m(U ) = dr (C),
Exercise 431 Prove that there are t = (q r − 1)/(q − 1) (r − 1)-dimensional subcodes of
an r -dimensional code over Fq .
Theorem 7.10.10 (Generalized Griesmer Bound) Let C be an [n, k] code over Fq . Then
for 1 ≤ r ≤ k,
#
k−r "
q −1
dr (C) .
n ≥ dr (C) +
q i (q r − 1)
i=1
Proof: If r = k, the inequality reduces to the obvious assertion n ≥ dk (C); see
Theorem 7.10.1. Now assume that r < k. Without loss of generality, using Theorem 7.10.1(ii), we may assume that C has a generator matrix G of the form
G=
G1
G2
O
,
G3
where G 1 is an r × dr (C) matrix of rank r and G 3 is a (k − r ) × (n − dr (C)) matrix. The
matrix [G 2 G 3 ] has rank k − r . If G 3 has rank less than k − r , there is a nonzero codeword
x in the code generated by [G 2 G 3 ], which is zero on the last n − dr (C) coordinates,
contradicting dr (C) < dr +1 (C) from Theorem 7.10.1. Therefore G 3 has rank k − r and
generates an [n − dr (C), k − r ] code C 3 .
Let c = c1 · · · cn ∈ C where wt(cdr (C)+1 · · · cn ) = a > 0. The subcode of C generated by
c and the first r rows of G is an (r + 1)-dimensional subcode of C with support size
288
Weight distributions
a + dr (C). So a ≥ dr +1 (C) − dr (C) implying that the minimum weight of C 3 is at least
dr +1 (C) − dr (C), by choosing the above c so that wt(cdr (C)+1 · · · cn ) has minimum weight in
C 3 . By the Griesmer Bound applied to C 3 ,
n − dr (C) ≥
k−r −1 "
i=0
dr +1 (C) − dr (C)
qi
#
k−r
=
i=1
"
#
dr +1 (C) − dr (C)
.
q i−1
By Lemma 7.10.9, (q r +1 − 1)dr (C) ≤ (q r +1 − q)dr +1 (C). Using this,
q r +1 − q
q −1
dr +1 (C) − dr (C) ≥ 1 − r +1
dr +1 (C).
dr +1 (C) = r +1
q
−1
q
−1
Therefore,
k−r
n − dr (C) ≥
i=1
"
#
q −1
d
(C)
.
r
+1
q i−1 (q r +1 − 1)
But again using Lemma 7.10.9,
k−r
n − dr (C) ≥
i=1
"
#
q −1
d
(C)
=
r
q i−1 (q r +1 − q)
k−r
i=1
"
#
q −1
d
(C)
.
r
q i (q r − 1)
The Griesmer Bound is the case r = 1 of the Generalized Griesmer Bound. We now show
that if a code meets the Griesmer Bound, it also meets the Generalized Griesmer Bound for
all r , and its weight hierarchy is uniquely determined; the binary case of this result is found
in [128]. To simplify the notation we let
k−r
br = dr (C) +
i=1
"
#
q −1
d
(C)
for 1 ≤ r ≤ k.
r
q i (q r − 1)
The Generalized Griesmer Bound then asserts that n ≥ br for 1 ≤ r ≤ k. We first show that
br +1 ≥ br and determine when br +1 = br .
Lemma 7.10.11 Let 1 ≤ r < k. Then br +1 ≥ br . Furthermore, br +1 = br if and only if both
of the following;hold:
<
(i) ;
dr +1 (C) = (q r +1 − 1)dr (C)/(q r<+1 −;q) , and
<
(ii) (q − 1)dr +1 (C)/(q i (q r +1 − 1)) = (q − 1)dr (C)/(q i+1 (q r − 1)) for 1 ≤ i ≤ k −
r − 1.
Proof: Lemma 7.10.9 implies that
#
" r +1
q
−1
d
(C)
dr +1 (C) ≥
r
q r +1 − q
and
"
# "
# "
#
q −1
q −1
q −1
d
(C)
≥
d
(C)
=
d
(C)
.
r
+1
r
r
q i (q r +1 − 1)
q i (q r +1 − q)
q i+1 (q r − 1)
(7.19)
(7.20)
289
7.10 Generalized Hamming weights
By (7.19) and (7.20),
k−r −1 "
#
q −1
d
(C)
r +1
q i (q r +1 − 1)
i=1
# k−r −1 "
#
" r +1
q −1
q
−1
d
d
(C)
+
(C)
≥
r
r
q r +1 − q
q i+1 (q r − 1)
i=1
"
# k−r "
#
q −1
q −1
d
d
= dr (C) +
(C)
+
(C)
r
r
q(q r − 1)
q i (q r − 1)
i=2
#
k−r "
q −1
d
= dr (C) +
(C)
= br .
r
q i (q r − 1)
i=1
br +1 = dr +1 (C) +
Clearly, br +1 = br if and only if equality holds in (7.19) and in (7.20) when 1 ≤ i ≤
k − r − 1.
Theorem 7.10.12 Let C be an [n, k, d] code over Fq meeting the Griesmer Bound. Then:
(i) C meets the Generalized
< Griesmer Bound for all r with 1 ≤ r ≤ k, and
−1 ;
d/q i for 1 ≤ r ≤ k.
(ii) dr (C) = ri=0
Proof: By the Generalized Griesmer Bound n ≥ bk ; using Lemma 7.10.11,
n ≥ bk ≥ bk−1 ≥ · · · ≥ b1 .
As C meets the Griesmer Bound, b1 = n. Therefore br = n for 1 ≤ r ≤ k, giving (i).
Note that (ii) holds when r = 1, since d1 (C) = d. Assume 1 < r ≤ k. By Lemma 7.10.11,
"
# "
#
q −1
q −1
ds (C) =
ds−1 (C) ,
q i (q s − 1)
q i+1 (q s−1 − 1)
for 1 < s ≤ k and 1 ≤ i ≤ k − s. Applying this inductively for s = r, r − 1, . . . , 2, we have
# "
# "
#
"
q −1
d
q −1
dr (C) =
d1 (C) =
,
q i (q r − 1)
q i+r −1 (q − 1)
q i+r −1
for 1 ≤ i ≤ k − r , as d1 (C) = d by Theorem 7.10.1. Since br = n, as C meets the Griesmer
Bound, we have
k−1
i=0
"
d
qi
#
k−r
= n = dr (C) +
i=1
"
#
q −1
dr (C) = dr (C) +
q i (q r − 1)
k−r
i=1
"
d
q i+r −1
#
.
So
k−1
dr (C) =
i=0
"
d
qi
#
and thus (ii) holds.
k−r
−
i=1
"
d
q i+r −1
#
=
r −1
i=0
"
d
qi
#
,
290
Weight distributions
Example 7.10.13 By Theorem 2.7.5, the [(q k − 1)/(q − 1), k, q k−1 ] simplex code C over
Fq meets the Griesmer Bound. So by Theorem 7.10.12, the weight hierarchy of C is
r −1
r −1 " k−1 #
r
q
k−1−i
k−r q − 1
=
.
q
=
q
dr (C) =
qi
q −1
i=0
i=0
Exercise 432 From Example 7.10.13, what is the weight hierarchy of:
(a) the [15, 4, 8] binary simplex code (see also Example 7.10.2), and
(b) the [121, 5, 81] ternary simplex code?
Exercise 433 Using the results of Exercise 432 and Theorem 7.10.4, find the weight
hierarchy of:
(a) the [15, 11, 3] binary Hamming code, and
(b) the [121, 116, 3] ternary Hamming code.
Exercise 434 Using the results of Example 7.10.13 and Theorem 7.10.4, find the weight
hierarchy of the [(q k − 1)/(q − 1), (q k − 1)/(q − 1) − k, 3] Hamming code Hq,k over
Fq .
As remarked earlier, generalized MacWilliams equations for generalized Hamming
weights have been established by Barg [14], Kløve [175], and Simonis [310]. We refer
the interested reader to those papers for details.
The generalized Hamming weights have been computed for other codes. Wei [348]
computed the weight hierarchy of the binary Reed–Muller codes of all orders and the
Hamming codes; see Exercise 434 and Section 5 of [329]. (Note that even though the
weight hierarchies of the binary Reed–Muller codes are known, the weight distributions of
these codes are unknown for orders larger than 2.) The weight hierarchies are known for
codes that meet the Griesmer Bound by Theorem 7.10.12; for codes that have length one
more than the Griesmer Bound, the weight hierarchies are also known [127].1
Research Problem 7.10.14 For families of codes such as BCH, quadratic residue, duadic,
or lexicodes, find further information, either exact or asymptotic, about some or all of the
generalized Hamming weights.
1
The proof in [127] is for binary codes only, but generalizes to nonbinary codes as well.
8
Designs
In this chapter we discuss some basic properties of combinatorial designs and their relationship to codes. In Section 6.5, we showed how duadic codes can lead to projective planes.
Projective planes are a special case of t-designs, also called block designs, which are the
main focus of this chapter. As with duadic codes and projective planes, most designs we
study arise as the supports of codewords of a given weight in a code.
8.1
t-designs
A t-(v, k, λ) design, or briefly a t-design, is a pair (P, B) where P is a set of v elements,
called points, and B is a collection of distinct subsets of P of size k, called blocks, such that
every subset of points of size t is contained in precisely λ blocks. (Sometimes one considers
t-designs in which the collection of blocks is a multiset, that is, blocks may be repeated.
In such a case, a t-design without repeated blocks is called simple. We will generally only
consider simple t-designs and hence, unless otherwise stated, the expression “t-design” will
mean “simple t-design.”) The number of blocks in B is denoted by b, and, as we will see
shortly, is determined by the parameters t, v, k, and λ. There are several special cases of
t-designs that have their own terminology:
r If λ = 1, a t-design is called a Steiner S(t, k, v) system or a Steiner t-design.
r If b = v, the t-design is symmetric and k − λ is called its order. Nontrivial symmetric
t-designs exist only for t ≤ 2.
r A symmetric 2-(v, k, 1) design (or, in the alternate notation, a symmetric S(2, k, v)
design) turns out to be a projective plane of order k − 1. This is not obvious from the
definition of projective plane in Section 6.5. In fact, we prove in Theorem 8.6.1 that a
set of points and lines forms a projective plane if and only if the set of points and lines
forms a symmetric 2-(v, k, 1) design.
It is often convenient to describe a t-design by giving a matrix that indicates the points
that are in each block. The incidence matrix for a t-design (P, B) is a matrix with entries
0 or 1 whose rows are indexed by the blocks of B and whose columns are indexed by
the points of P where the (i, j)-entry is 1 if and only if the ith block contains the jth
point. The incidence matrix as defined here is the transpose of the incidence matrix defined
by some other authors. This definition is used because, in some applications, the rows
of the incidence matrix will represent the supports of codewords of a code, as we now
illustrate.
292
Designs
Example 8.1.1 The binary [7, 4, 3] Hamming code H3 is a cyclic code with generator
polynomial g(x) = 1 + x + x 3 . The following matrix lists the codewords of weight 3 in H3 :
1 1 0 1 0 0 0
0 1 1 0 1 0 0
0 0 1 1 0 1 0
A = 0 0 0 1 1 0 1 .
1 0 0 0 1 1 0
0 1 0 0 0 1 1
1 0 1 0 0 0 1
If we label the coordinates P = {0, 1, 2, 3, 4, 5, 6}, then the supports of these seven
codewords are:
B = {{0, 1, 3}, {1, 2, 4}, {2, 3, 5}, {3, 4, 6}, {4, 5, 0}, {5, 6, 1}, {6, 0, 2}}.
It is easy to check that (P, B) is a 2-(7, 3, 1) design (or an S(2, 3, 7) Steiner system). As
the number of points is the number of blocks, the design is symmetric and so the design is
a projective plane of order 3 − 1 = 2. The matrix A is the incidence matrix of this design.
Compare this example to Exercise 358.
Example 8.1.2 Extend the Hamming code of the previous example, where we denote the
3 . Let P be the coordinates
extended coordinate by ∞, to obtain the [8, 4, 4] binary code H
3 , and their supports are
{0, 1, 2, 3, 4, 5, 6, ∞}. There are 14 codewords of weight 4 in H
the set
B = {{0, 1, 3, ∞}, {2, 4, 5, 6}, {1, 2, 4, ∞}, {0, 3, 5, 6}, {2, 3, 5, ∞},
{0, 1, 4, 6}, {3, 4, 6, ∞}, {0, 1, 2, 5}, {4, 5, 0, ∞}, {1, 2, 3, 6},
{5, 6, 1, ∞}, {0, 2, 3, 4}, {6, 0, 2, ∞}, {1, 3, 4, 5}}.
Notice how the supports containing ∞ are related to the supports of the weight 3 codewords
of H3 , and how the supports containing ∞ are related to the supports not containing ∞.
It is easy to check that every set of three coordinates is contained in precisely one block.
Thus (P, B) is a 3-(8, 4, 1) design or an S(3, 4, 8) Steiner system. This design is not
symmetric.
Exercise 435 Verify the claims of Examples 8.1.1 and 8.1.2 that (P, B) is a block
design.
As with codes, there is the notion of equivalence of designs. Two designs (P 1 , B 1 )
and (P 2 , B 2 ) are equivalent provided there is a bijection from P 1 onto P 2 that induces a
bijection from B 1 onto B 2 . A permutation of P is an automorphism of (P, B) provided
the permutation induces a bijection on B. The automorphism group of (P, B), denoted
Aut(P, B), is the group of all automorphisms of (P, B).
Exercise 436 Consider the 2-(7, 3, 1) and 3-(8, 4, 1) designs presented in Examples 8.1.1
and 8.1.2. Prove that the permutation (0, 1, 2, 3, 4, 5, 6) is an automorphism of each
design.
293
8.1 t-designs
If, as in Examples 8.1.1 and 8.1.2, the supports of the codewords of a fixed weight of
a code are the blocks of a t-design for some t, then we say that the code holds a design.1
Conversely, the row space, over some field, of the incidence matrix of a t-(v, k, λ) design
defines a code. One cannot in general expect that the blocks are precisely the supports of
all the codewords of weight k, but under suitable circumstances this is the case.
Remarkably, if (P, B) is a t-(v, k, λ) design, it is also an i-(v, k, λi ) design for i < t,
where λi is given in the next theorem.
Theorem 8.1.3 Let (P, B) be a t-(v, k, λ) design. Let 0 ≤ i ≤ t. Then (P, B) is an i(v, k, λi ) design, where
v−i
(v − i)(v − i − 1) · · · (v − t + 1)
t −i
=λ
.
λi = λ
k −i
(k − i)(k − i − 1) · · · (k − t + 1)
t −i
Proof: Let I ⊆ P, where |I | = i. Let N be the number of blocks that contain I . Define
X = {(T ,B) | I ⊆ T ⊆ B with |T | = t and B ∈ B}. We determine |X | in two ways. There
are vt −− ii subsets of P \ I of size t − i; when I is added to each of these subsets, we get
a t-element set T . As (P, B) is a t-(v, k, λ) design, each of these sets T containing I is in
λ blocks. Therefore,
v−i
|X | =
λ.
t −i
There are N blocks containing
I . For each block B containing I , we can choose t − i
elements of B \ I in kt −− ii ways so that when added to I form a t-element set T contained
in B and containing I . So,
k −i
|X | = N
.
t −i
Equating these two counts for |X |, we see that N depends only on the size of I and the
result follows by solving for N .
Example 8.1.4 Let (P, B) be the 2-(7, 3, 1) symmetric design of Example 8.1.1. In the
notation of Theorem 8.1.3, λ0 = 7, λ1 = 3, and λ2 = 1. The interpretation of these numbers
is as follows: The empty set is in all seven blocks, each point is in three blocks ((P, B)
is a 1-(7, 3, 3) design), and each pair of points is in one block ((P, B) is a 2-(7, 3, 1)
design). Each of these can be confirmed by direct verification from the blocks listed in the
example.
Example 8.1.5 Let (P, B) be the 3-(8, 4, 1) design of Example 8.1.2. In the notation of
Theorem 8.1.3, λ0 = 14, λ1 = 7, λ2 = 3, and λ3 = 1. The interpretation of these numbers
is as follows, each of which can again be verified directly by looking at the blocks: The
empty set is in all 14 blocks, each point is in seven blocks ((P, B) is a 1-(8, 4, 7) design),
1
If several distinct codewords have the same support, these codewords determine a unique block.
294
Designs
each pair of points is in three blocks ((P, B) is a 2-(8, 4, 3) design), and each triple of
points is in one block ((P, B) is a 3-(8, 4, 1) design).
In the preceding examples we see that λ0 is the number of blocks in the design; in
Exercise 437, you are asked to prove that. By definition λ1 is the number of blocks containing
any given point. Thus we have the following theorem.
Theorem 8.1.6 In a t-(v, k, λ) design, the number of blocks is
v
t
b = λ0 = λ ,
k
t
and every point is in exactly
λ1 =
bk
λ0 k
=
v
v
blocks.
Exercise 437 Let (P, B) be a t-(v, k, λ) design.
(a) Give a direct proof that there are
v
t
b = λ0 = λ
k
t
blocks in the design.
(b) Verify that λ1 = λ0 k/v = bk/v.
The number λ1 of blocks containing a given point is called the replication number and
is sometimes denoted r . A 2-(v, k, λ) design is sometimes called a balanced incomplete
block design or a (b, v, r , k, λ) design. The values for b and r are determined from v, k,
and λ using Theorem 8.1.6 with t = 2.
The fact that each λi is an integer implies certain constraints on the parameters t, v, k,
and λ in order for a t-(v, k, λ) design to exist. A main problem in the study of designs is
to determine whether numbers t, v, k, and λ, for which the λi are all integers, are actually
the parameters of a t-(v, k, λ) design. A secondary problem is to classify all designs with
given parameters up to equivalence when such designs exist.
Exercise 438 Show that designs with the following parameters cannot exist:
(a) t = 2, v = 90, k = 5, and λ = 2,
(b) t = 4, v = 10, k = 5, and λ = 5.
295
8.2 Intersection numbers
8.2
Intersection numbers
There are other integers associated with a t-design (P, B) that describe certain intersection
properties of the design. Let I and J be subsets of P where I ∩ J = ∅. Suppose that |I | = i
j
and |J | = j. Denote the number of blocks in B that contain I and are disjoint from J by λi .
j
The next theorem shows that λi is independent of the choice of I and J provided i + j ≤ t.
We will also see in this theorem that these numbers satisfy a recursion reminiscent of the
one satisfied by binomial coefficients. These numbers are called intersection numbers for
the design. Certain of the intersection numbers have specific meaning:
r λ0 is the number b of blocks,
0
r λ0 = λ , and
i
i
r λ j is the number of blocks not intersecting a given set of points of size j.
0
Theorem 8.2.1 Let (P, B) be a t-(v, k, λ) design. Let I and J be disjoint subsets of P of
j
size i and j, respectively. If i + j ≤ t, λi is independent of I and J , and for j ≥ 1,
j
j−1
λi = λi
j−1
(8.1)
− λi+1 .
Also
v−i − j
k −i
j
.
λi = λ
v−t
k−t
(8.2)
Proof: We give an indication of the proof and ask the reader to give a formal proof in
Exercise 440. Referring to Figure 8.1, the right-hand “edge” gives the entries λi0 , which are
λi by definition. By Theorem 8.1.3, λi is independent of I for all i ≤ t and so the right-hand
λ00
λ10
λ20
λ30
q
λ40
q
qq
❅
❅
λ11
❅
❅
λ21
❅
❅
λ31
❅
❅
λ01
❅
❅
λ02
❅
❅
λ12
❅
❅
λ22
❅
❅
❅
λ13
❅
λ03
❅
❅
λ04
Figure 8.1 Pascal triangle of a t-(v, k, λ) design.
qq
q
q
296
Designs
“edge” of the figure is uniquely determined. Let I = { p1 }. All λ00 blocks either contain p1 or
they do not. The number containing p1 is λ01 ; thus the number not containing p1 is λ00 − λ01 .
Hence λ10 = λ00 − λ01 is independent of I . So the second row of the triangle in Figure 8.1 is
uniquely determined. Now consider the third row. Let I = { p1 } and J = {q1 } with p1 = q1 .
The intersection number λ01 is the number of blocks containing p1 . These blocks fall into
two categories: those that contain q1 and those that do not. The number containing q1 is
λ02 ; hence the number not containing q1 is λ01 − λ02 . Thus the entry λ11 is λ01 − λ02 , which is
therefore independent of both I and J . Now let J = {q1 , q2 }. There are λ10 blocks that do
not contain q2 ; as this entry is in row 2, it is independent of q2 . Again these blocks either
contain q1 or they do not. There are λ11 that do, independent of q1 , and again λ20 must be
λ10 − λ11 independent of J . Thus row three of Figure 8.1 is uniquely determined and has the
values claimed. The formal proof of (8.1) follows inductively along similar lines.
j
Using the recurrence relation (8.1), we can determine all λi with i + j ≤ t from the λi0
with 0 ≤ i ≤ t. For example,
λ20 = λ10 − λ11 = λ00 − λ01 − λ01 − λ02 = λ00 − 2λ01 + λ02 = λ0 − 2λ1 + λ2 .
Since the value for λi0 given in (8.2) agrees with the value for λi = λi0 given in Theorem 8.1.3,
j
and since the values of λi given in (8.2) satisfy the recurrence relation (8.1), as Exercise 439
shows, the theorem follows.
j
Exercise 439 Show that the values of λi given in (8.2) satisfy the recurrence relation
(8.1).
Exercise 440 Give a formal proof of Theorem 8.2.1.
As mentioned previously, (8.1) is reminiscent of the recurrence relation that defines
the Pascal triangle of binomial coefficients; we call the family of intersection numbers
j
{λi | i + j ≤ t} the Pascal triangle of (P, B). Figure 8.1 gives a visual representation of
this triangle. Notice that the triangle is determined completely from either “edge,” that is,
j
from the values λi0 = λi for 0 ≤ i ≤ t or from the values λ0 for 0 ≤ j ≤ t. By iterating
(8.1), we obtain an explicit formula for the entries in the triangle from the values along the
“right edge.”
Corollary 8.2.2 If 0 ≤ i + j ≤ t,
i+ j
j
j
m−i
λi =
(−1)
λm .
m −i
m=i
Exercise 441 Prove Corollary 8.2.2. Hint: Show that if
i+ j
j
j
i =
λm ,
(−1)m−i
m −i
m=i
j
then i0 = λi and i satisfies the recursion of (8.1).
Exercise 442 Find an analogous formula to that of Corollary 8.2.2 involving the “left”
j
edge λ0 of the Pascal triangle. Prove your formula is correct.
297
8.2 Intersection numbers
Exercise 443 There is a 5-(18, 8, 6) design whose blocks are the supports of the minimum
weight vectors in an [18, 9, 8] Hermitian self-dual code over F4 .
(a) Construct the Pascal triangle for this design.
(b) How many minimum weight vectors are there in the code?
If the design is a Steiner design (that is λ = 1) it is possible to add k − t new rows to the
j
Pascal triangle provided we restrict the subsets I and J in our definition of λi . Let (P, B) be
a t-(v, k, 1) design and suppose B ∈ B. Let I and J be subsets of P where |I | = i, |J | = j,
and I ∩ J = ∅, and assume that there is a block B such that I ∪ J ⊆ B. If t < i + j ≤ k,
j
we define λi to be the number of blocks in B containing I and disjoint from J .
Theorem 8.2.3 Let (P, B) be a t-(v, k, 1) design. Let I and J be subsets of P where
|I | = i, |J | = j, and I ∩ J = ∅, and assume that I ∪ J is a subset of some block in B. If
j−1
j−1
j
j
t < i + j ≤ k, then λi is independent of I and J . Also λi = λi − λi+1 .
Proof: As λ = 1, λi0 = 1 for t < i ≤ k. Thus λi0 is independent of I . The remainder of the
argument follows as in the proof of Theorem 8.2.1.
j
Notice that λi does not satisfy (8.2) if t < i + j ≤ k because λi0 does not satisfy that
equation for t < i ≤ k. However, Corollary 8.2.2 does hold whenever i + j ≤ k as Exerj
cise 441 shows. If λ = 1, the triangle formed by {λi | 0 ≤ i + j ≤ k} is called the extended
Pascal triangle of (P, B). Figures 8.2 and 8.3 give the extended Pascal triangles of a 3j
(8, 4, 1) design and a 5-(24, 8, 1) design, respectively. The values λi for t ≤ i + j ≤ k are
connected by double lines to indicate the extended part of the triangle.
A 3-(8, 4, 1) design was given in Example 8.1.2; the design arose as the set of supports
3 . This code is the only [8, 4, 4] binary code by Exercise 56,
of the weight 4 vectors in H
but this does not imply that the design is unique. In the next example, we show that in fact
any 3-(8, 4, 1) design is the set of supports of the weight 4 codewords in an [8, 4, 4] binary
code and hence is unique.
Exercise 444 Verify the entries in Figures 8.2 and 8.3 given that the 3-(8, 4, 1) design has
14 blocks and the 5-(24, 8, 1) design has 759 blocks.
Example 8.2.4 We show that the 3-(8, 4, 1) design is unique up to equivalence. From the
bottom row of the extended Pascal triangle of Figure 8.2, we see that any two blocks meet
in an even number of points. Hence if we associate to each block of the design a vector in
14
❅
❅
7
7
❅
❅ ❅
❅
3
4
3
❅
❅
❅ ❅ ❅
❅
1
2
2
1
❅
❅
❅
❅ ❅ ❅ ❅
❅
1
0
2
0
1
Figure 8.2 Extended Pascal triangle of a 3-(8, 4, 1) design.
298
Designs
759
❅
❅
506 253
❅
❅ ❅
❅
330 176 77
❅
❅ ❅
❅
❅ ❅
210 120 56 21
❅
❅
❅ ❅
❅ ❅
❅ ❅
5
130 80 40 16
❅
❅
❅ ❅
❅ ❅
❅ ❅
❅ ❅
78 52 28 12
4
1
❅
❅
❅
❅
❅ ❅
❅ ❅
❅ ❅
❅
❅ ❅
❅
❅ ❅
❅
❅
46 32 20
8
4
0
1
❅
❅ ❅
❅ ❅
❅ ❅
❅ ❅
❅ ❅
❅ ❅
❅
30 16 16
4
4
0
0
1
❅
❅
❅
❅
❅
❅
❅
❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅
❅
30
0
16
0
4
0
0
0
1
Figure 8.3 Extended Pascal triangle of a 5-(24, 8, 1) design.
F82 whose support is that block, these vectors generate a self-orthogonal doubly-even binary
code
C of length 8 by Theorem 1.4.8(i). By self-orthogonality, the dimension of
C is at most
4; as there are 14 blocks, the dimension is 4. Thus
C must be one of the equivalent forms
3 . Thus an 3-(8, 4, 1) design is equivalent to the
of the [8, 4, 4] extended Hamming code H
design of Example 8.1.2.
8.3
Complementary, derived, and residual designs
If we begin with a t-(v, k, λ) design (P, B), there are three other natural designs that arise
from this design:
r The complementary design for (P, B) is the design (P, B ′ ) where B ′ consists of the
complements of the blocks in B. Theorem 8.2.1 shows that (P, B ′ ) is in fact a t-(v, v −
k, λt0 ) design whose Pascal triangle is the reflection, across the vertical line through the
apex, of the Pascal triangle of (P, B). If the complement of every block is already in the
design, the design is called self-complementary; in this case of course, k must be v/2. If
λ = 1 and k ≤ v/2, the Pascal triangle of the complementary design of (P, B) can also
be extended by k − t rows by reflecting the extended Pascal triangle of (P, B), provided
j
the extended entries are interpreted correctly; if t < i + j ≤ k, λi in the complementary
design represents the number of blocks containing I and disjoint from J where |I | = i,
|J | = j, and I ∪ J is disjoint from some block in the complementary design.
r Let x ∈ P be fixed. The derived design for (P, B) with respect to x is the design with
points P \ {x} and blocks {B \ {x} | x ∈ B ∈ B}. Theorem 8.2.1 shows that the derived
design is a (t − 1)-(v − 1, k − 1, λ) design whose Pascal triangle is the subtriangle of
(P, B) consisting of λ01 and the nodes below. If λ = 1, the extended Pascal triangle of the
299
8.3 Complementary, derived, and residual designs
derived design is the subtriangle of the extended triangle of (P, B) consisting of λ01 and
the nodes below.
r Again let x ∈ P be fixed. The residual design for (P, B) with respect to x is the design
with points P \ {x} whose blocks are B ∈ B not containing x. Again Theorem 8.2.1 shows
that the residual design is a (t − 1)-(v − 1, k, λt−1 − λ) design whose Pascal triangle is
the subtriangle of (P, B) consisting of λ10 and the nodes below.
Exercise 445 Verify all the claims made in connection with the definition of complementary, derived, and residual designs.
Exercise 446 Let D = (P, B) be a t-design with t > 1 and choose x ∈ P. Let Dc be the
complementary design of D, Dd the derived design of D with respect to x, and Dr the
residual design of D with respect to x.
(a) Let D1 be the complementary design of Dr and D2 the derived design of Dc with respect
to x. Prove that D1 = D2 .
(b) Let D3 be the complementary design of Dd and D4 the residual design of Dc with
respect to x. Prove that D3 = D4 .
Example 8.3.1 In Example 8.2.4 we showed that a 3-(8, 4, 1) design (P, B) is equivalent
3 , the [8, 4, 4] extended binary
to the design obtained from the weight 4 vectors of H
Hamming code. We describe the complementary, derived, and residual designs obtained
from (P, B):
r Since the complement of every block in (P, B) is already in (P, B), the design is
3 contains the all-one
self-complementary. Notice that this is related to the fact that H
codeword and the blocks have size half the length of the code.
r There are λ0 = 7 blocks of (P, B) that contain any given point x. The 2-(7, 3, 1) derived
1
design, obtained by deleting x from these seven blocks, gives the supports of the weight
3 vectors in the code punctured on the coordinate represented by x. This design is the
projective plane of order 2 and is equivalent to the design obtained in Example 8.1.1.
r There are λ1 = 7 blocks of (P, B) that do not contain any given point x. The 2-(7, 4, 2)
0
residual design with respect to x consists of the supports of the weight 4 vectors in the
code punctured on the coordinate represented by x.
Exercise 447 Find the parameters of the complementary, residual, and derived designs
obtained from the 5-(18, 8, 6) design of Exercise 443. Also give the Pascal triangle of each
of these designs.
Example 8.3.2 In this example, we show that any 5-(24, 8, 1) design is held by the codewords of weight 8 in a [24, 12, 8] self-dual binary code. We, however, do not know that
this design even exists. We will show in Example 8.4.3 that the design indeed does exist. In
Section 10.1, we will show that both the design and the code are unique up to equivalence;
the code is the extended binary Golay code.
Let (P, B) be such a 5-(24, 8, 1) design. A subset of P of size 4 will be called a tetrad. If
T is a tetrad and p1 is a point not in T , there is a unique block B ∈ B containing T ∪ { p1 }
because (P, B) is a Steiner 5-design. Let T1 be B \ T . Thus T1 is the unique tetrad disjoint
300
Designs
from T containing p1 such that T ∪ T1 ∈ B. Letting p2 be a point not in T ∪ T1 , we can
similarly find a unique tetrad T2 containing p2 such that T ∩ T2 = ∅ and T ∪ T2 ∈ B. As
(P, B) is a Steiner 5-design, if T ∪ T1 and T ∪ T2 have five points in common, they must
be equal implying T1 = T2 , which is a contradiction as p2 ∈ T2 \ (T ∪ T1 ); so T1 ∩ T2 = ∅.
Continuing in this manner we construct tetrads T3 , T4 , and T5 such that T , T1 , . . . , T5 are
pairwise disjoint and T ∪ Ti ∈ B for 1 ≤ i ≤ 5; the tetrads T , T1 , . . . , T5 are called the
sextet determined by T .
Let C be the binary linear code of length 24 spanned by vectors whose supports are the
blocks in B. By examining the last line of Figure 8.3, we see that two distinct blocks intersect
in either 0, 2, or 4 points because λi8−i is nonzero only for i = 0, 2, 4, or 8; λ08 = 1 simply
says that only one block of size 8 contains the eight points of that block. In particular, C
is self-orthogonal and therefore doubly-even, by Theorem 1.4.8(i). We prove that C has
minimum weight 8. If it does not, it has minimum weight 4. Suppose c ∈ C has weight 4
and support S. Let T be a tetrad intersecting S in exactly three places. Constructing the
sextet determined by T , we obtain codewords ci with support T ∪ Ti for 1 ≤ i ≤ 5. As
|S| = 4 and the tetrads are pairwise disjoint, we have S ∩ Ti = ∅ for some i; therefore
wt(c + ci ) = 6, which is a contradiction as C is doubly-even. Hence C is a self-orthogonal
code with minimum weight d = 8.
The support of any codeword in C of weight 8 is called an octad. We next show that the
octads are precisely the blocks. Clearly, the blocks of B are octads. Let c be a codeword in
C of weight 8 with support S. We show S is a block of B. Fix five points of S. There is a
unique block B ∈ B containing these five points; as blocks are octads, there is a codeword
b ∈ C with support B. Since wt(c + b) ≤ 6, c = b as c + b ∈ C and C has minimum weight
8. So S = B and hence every octad is a block of B.
Since C is self-orthogonal, it has dimension at most 12. We show that it has dimension
exactly 12. We do this by counting the number of cosets of C in F24
2 . There is a unique
⌊(d − 1)/2⌋ = 3 by Exercise 66. Thus for
coset leader in cosets
of
weight
i,
0
≤
i
≤
0 ≤ i ≤ 3, there are 24i cosets of weight i, accounting for 2325 cosets. Let v be a coset
leader in a coset of weight 4; suppose its support is the tetrad T . If w is another coset leader
of this coset, then v + w must be a codeword and hence must be of weight 8. Thus the
other coset leaders are determined once we find all weight 8 codewords whose supports
contain T . Consider the sextet determined by T . If ci ∈ C has support T ∪ Ti , the only
codewords of weight 8 whose support contains T are c1 , . . . , c5 because all octads are
blocks and the sextet determined by T is unique. So there are
six coset leaders in
exactly
the coset, namely v, v + c1 , . . . , v + c5 . Therefore there are 16 24
=
1771
cosets of weight
4
12
4, and hence, 2 cosets of weight 4 or less. There are no cosets of weight 5 because, if
v has weight 5, there is a codeword c of weight 8 such that wt(v + c) = 3 as (P, B) is
a 5-design, which is a contradiction. By Theorem 1.12.6(v), the covering radius of C is
4. Therefore C is a [24, 12, 8] self-dual code whose codewords of weight 8 support the
blocks B.
Example 8.3.3 Let C be the [24, 12, 8] extended binary Golay code, and let D = (P, B)
be the 5-(24, 8, 1) design held by the weight 8 codewords of C. (See Example 8.4.3 where
we show that the weight 8 codewords of C indeed hold a 5-design.) We produce a number
of designs related to D.
301
8.3 Complementary, derived, and residual designs
r The complementary design D for D is a 5-(24, 16, 78) design, since λ5 = 78. As the all0
0
r
r
r
r
r
r
r
r
r
one vector is in C, this design must be held by the weight 16 codewords of C. The extended
Pascal triangle for D0 (with the appropriate interpretation for the extended nodes) is the
extended Pascal triangle for D reflected across the vertical line through the apex.
Puncturing C with respect to any coordinate p1 gives the [23, 12, 7] binary Golay code
C 1 . The weight 7 codewords of C 1 must come from the weight 8 codewords of C with
p1 in their support. Thus the weight 7 codewords of C 1 hold the 4-(23, 7, 1) design D1
derived from D with respect to p1 . The extended Pascal triangle of D1 is the subtriangle
with apex λ01 = 253 (the number of blocks of D1 ) obtained from Figure 8.3. Notice that
by Exercise 403, C 1 indeed has 253 weight 7 codewords.
The weight 8 codewords of C 1 must come from the weight 8 codewords of C that do
not have p1 in their support. Thus the weight 8 codewords of C 1 hold the 4-(23, 8, 4)
design D2 , which is the residual design for D with respect to p1 . The Pascal triangle of
D2 is the (non-extended) subtriangle with apex λ10 = 506 (the number of blocks of D2 )
obtained from Figure 8.3. Notice again that by Exercise 403, C 1 indeed has 506 weight 8
codewords.
The complementary design for D1 is the 4-(23, 16, 52) design D3 held by the weight 16
codewords of C 1 again because C 1 contains the all-one codeword.
The complementary design for D2 is the 4-(23, 15, 78) design D4 held by the weight 15
codewords of C 1 . The design D4 is also the derived design of D0 with respect to p1 ; see
Exercise 446.
Puncturing C 1 with respect to any coordinate p2 gives the [22, 12, 6] code C 2 . The weight
6 codewords of C 2 hold the 3-(22, 6, 1) design D5 derived from D1 with respect to p2 ;
the extended Pascal triangle of D5 is the subtriangle with apex λ02 = 77 obtained from
Figure 8.3.
The weight 16 codewords of C 2 hold the 3-(22, 16, 28) design D6 , which is both the
complementary design for D5 and the residual design for D3 with respect to p2 ; again
see Exercise 446.
The weight 8 codewords of C 2 hold the 3-(22, 8, 12) design D7 , which is the residual
design for D2 with respect to p2 , whose Pascal triangle is the (nonextended) subtriangle
with apex λ20 = 330 obtained from Figure 8.3.
The weight 14 codewords of C 2 hold the 3-(22, 14, 78) design D8 , which is the complementary design for D7 and the derived design of D4 with respect to p2 .
The supports of the weight 7 codewords of C 2 are the union of the blocks of two different
3-designs. The weight 7 codewords of C 2 arise from the codewords of weight 8 in C whose
supports contain exactly one of p1 or p2 . Thus the supports of the weight 7 codewords of
C 2 are either blocks in the 3-(22, 7, 4) residual design obtained from D1 with respect to
p2 or blocks in the 3-(22, 7, 4) design derived from D2 with respect to p2 . Hence there
are 352 weight 7 codewords in C 2 that hold a 3-(22, 7, 8) design D9 whose Pascal triangle
is the (non-extended) subtriangle with apex λ11 = 176 obtained from Figure 8.3 with all
entries doubled.
The next two examples show how to use designs to find the weight distributions of the
punctured codes C 1 and C 2 of Example 8.3.3 obtained from the [24, 12, 8] extended binary
Golay code.
302
Designs
Example 8.3.4 In Exercise 403, we found the weight distribution of C 1 using Prange’s
Theorem. As an alternate approach, by examining the Pascal triangles for the designs that
arise here, the weight distribution of C 1 can be obtained. We know that Ai (C 1 ) = A23−i (C 1 ),
as C 1 contains the all-one codeword. Obviously, A0 (C 1 ) = A23 (C 1 ) = 1. The number of
weight 7 codewords in C 1 is the number of blocks in D1 , which is the apex of its Pascal
triangle (and the entry λ01 of Figure 8.3); thus A7 (C 1 ) = A16 (C 1 ) = 253. The number of
weight 8 codewords is the number of blocks in D2 , which is the apex of its Pascal triangle
(and the entry λ10 of Figure 8.3); thus A8 (C 1 ) = A15 (C 1 ) = 506. We complete the weight
distribution by noting that A11 (C 1 ) = A12 (C 1 ) = 211 − 506 − 253 − 1 = 1288.
Example 8.3.5 A similar computation can be done to compute the weight distribution
of C 2 . Again since C 2 contains the all-one codeword, Ai (C 2 ) = A22−i (C 2 ) and A0 (C 2 ) =
A22 (C 2 ) = 1. The number of weight 6 codewords in C 2 is the number of blocks in D5 , which
is the entry λ02 of Figure 8.3, implying A6 (C 2 ) = A16 (C 2 ) = 77. The number of weight 7
codewords is the number of blocks in D9 , which is two times the entry λ11 of Figure 8.3;
thus A7 (C 2 ) = A16 (C 2 ) = 352. The number of weight 8 codewords is the number of blocks
in D7 , which is the entry λ20 of Figure 8.3; hence A8 (C 2 ) = A14 (C 2 ) = 330. We will see
later that the weight 12 codewords of C hold a 5-design. The Pascal triangle for that design
is given in Figure 8.4. The number of weight 10 codewords in C 2 must be λ02 from that
figure. So A10 (C 2 ) = A12 (C 2 ) = 616. Finally, A11 (C 2 ) = 212 − 2 · (1 + 77 + 352 + 330 +
616) = 1344.
Exercise 448 Find the Pascal triangle or the extended Pascal triangle, whichever is appropriate, for the designs D0 , D1 , . . . , D9 described in Example 8.3.3.
Examples 8.3.1 and 8.3.3 illustrate the following result, whose proof is left as an exercise.
Theorem 8.3.6 Let C be a binary linear code of length n such that the vectors of weight w
hold a t-design D = (P, B). Let x ∈ P and let C ∗ be C punctured on coordinate x.
(i) If the all-one vector is in C, then the vectors of weight n − w hold a t-design, and this
design is the complementary design of D.
(ii) If C has no vectors of weight w + 1, then the residual (t − 1) design of D with respect
to x is the design held by the weight w vectors in the code C ∗ .
(iii) If C has no vectors of weight w − 1, then the (t − 1) design derived from D with respect
to x is the design held by the weight w − 1 vectors in the code C ∗ .
Exercise 449 Prove Theorem 8.3.6.
There is no known general result that characterizes all codes whose codewords of a given
weight, such as minimum weight, hold a t-design. There are, however, results that show
that codes satisfying certain conditions have weights where codewords of that weight hold
a design. The best known of these is the Assmus–Mattson Theorem presented in the next
section. It is this result that actually guarantees (without directly showing that supports of
codewords satisfy the conditions required in the definition of a t-design) that the weight 4
codewords of the [8, 4, 4] extended binary Hamming code hold a 3-(8, 4, 1) design and
the weight 8 codewords of the [24, 12, 8] extended binary Golay code hold a 5-(24, 8, 1)
design.
303
8.4 The Assmus–Mattson Theorem
8.4
The Assmus–Mattson Theorem
If the weight distribution of a code and its dual are of a particular form, a powerful result due
to Assmus and Mattson [6] guarantees that t-designs are held by codewords in both the code
and its dual. In fact, the Assmus–Mattson Theorem has been the main tool in discovering
designs in codes.
For convenience, we first state the Assmus–Mattson Theorem for binary codes. We prove
the general result, as the proof for binary codes is not significantly simpler than the proof
for codes over an arbitrary field.
Theorem 8.4.1 (Assmus–Mattson) Let C be a binary [n, k, d] code. Suppose C ⊥ has
minimum weight d ⊥ . Suppose that Ai = Ai (C) and Ai⊥ = Ai (C ⊥ ), for 0 ≤ i ≤ n, are
the weight distributions of C and C ⊥ , respectively. Fix a positive integer t with t < d, and
let s be the number of i with Ai⊥ = 0 for 0 < i ≤ n − t. Suppose s ≤ d − t. Then:
(i) the vectors of weight i in C hold a t-design provided Ai = 0 and d ≤ i ≤ n, and
(ii) the vectors of weight i in C ⊥ hold a t-design provided Ai⊥ = 0 and d ⊥ ≤ i ≤ n − t.
Theorem 8.4.2 (Assmus–Mattson) Let C be an [n, k, d] code over Fq . Suppose C ⊥ has
minimum weight d ⊥ . Let w be the largest integer with w ≤ n satisfying
w+q −2
w−
< d.
q −1
(So w = n when q = 2.) Define w ⊥ analogously using d ⊥ . Suppose that Ai = Ai (C) and
Ai⊥ = Ai (C ⊥ ), for 0 ≤ i ≤ n, are the weight distributions of C and C ⊥ , respectively. Fix a
positive integer t with t < d, and let s be the number of i with Ai⊥ = 0 for 0 < i ≤ n − t.
Suppose s ≤ d − t. Then:
(i) the vectors of weight i in C hold a t-design provided Ai = 0 and d ≤ i ≤ w, and
(ii) the vectors of weight i in C ⊥ hold a t-design provided Ai⊥ = 0 and d ⊥ ≤ i ≤
min{n − t, w ⊥ }.
Proof: Let T be any set of t coordinate positions, and let C T be the code of length n − t
obtained from C by puncturing on T . Let C ⊥ (T ) be the subcode of C ⊥ that is zero on T ,
and let (C ⊥ )T be the code C ⊥ shortened on T . Since t < d, it follows from Theorem 1.5.7
that C T is an [n − t, k, d T ] code with d T ≥ d − t and (C T )⊥ = (C ⊥ )T .
Let Ai′ = Ai (C T ) and Ai′ ⊥ = Ai ((C T )⊥ ) = Ai ((C ⊥ )T ), for 0 ≤ i ≤ n − t, be the weight
distributions of C T and (C T )⊥ , respectively. As s ≤ d − t ≤ d T , Ai′ = 0 for 1 ≤ i ≤ s − 1.
If S = {i | Ai⊥ = 0, 0 < i ≤ n − t}, then, as Ai′ ⊥ ≤ Ai⊥ and |S| = s, the Ai′ ⊥ are unknown
only for i ∈ S. These facts about Ai′ and Ai′ ⊥ are independent of the choice of T . By
Theorem 7.3.1, there is a unique solution for all Ai′ and Ai′ ⊥ , which must therefore be the
same for each set T of size t. The weight distribution of C ⊥ (T ) is the same as the weight
distribution of (C ⊥ )T ; hence the weight distribution of C ⊥ (T ) is the same for all T of size t.
In a code over any field, two codewords of minimum weight with the same support must
be scalar multiples of each other by Exercise 451. Let B be the set of supports of the vectors
in C of weight d. Let T be a set of size t. The codewords in C of weight d, whose support
304
Designs
contains T , are in one-to-one correspondence with the vectors in C T of weight d − t. There
are A′d−t such vectors in C T and hence A′d−t /(q − 1) blocks in B containing T . Thus the
codewords of weight d in C hold a t-design.
We prove the rest of (i) by induction. Assume that the codewords of weight x in C
with A x = 0 and d ≤ x ≤ z − 1 < w, for some integer z, hold t-designs. Suppose the
j
intersection numbers of these designs are λi (x). If d ≤ x ≤ z − 1 < w but A x = 0, set
j
λi (x) = 0. By Exercise 451 the value w has been chosen to be the largest possible weight
so that a codeword of weight w or less in C is determined uniquely up to scalar multiplication
by its support. If A z = 0, we show that the codewords of weight z in C hold a t-design.
Suppose that there are N (T ) codewords in C of weight z whose support contains T . Every
vector in C of weight z with support containing T is associated with a vector of weight z − t
in C T . However, every vector of weight z − t in C T is associated with a vector of weight
z − ℓ in C whose support intersects T in a set of size t − ℓ for 0 ≤ ℓ ≤ z − d. A calculation
completed in Exercise 452 shows that
z−d
t ℓ
λt−ℓ (z − ℓ).
A′z−t = N (T ) + (q − 1)
ℓ
ℓ=1
Therefore, N (T ) is independent of T , and hence the codewords of weight z in C hold a
t-design. Thus (i) holds by induction.
Let d ⊥ ≤ i ≤ min{n − t, w ⊥ }. Codewords in C ⊥ of weight w ⊥ or less are determined
uniquely up to scalar multiplication by their supports by Exercise 451. Let B be the set of
all supports of codewords in C ⊥ of weight i, and let B ′ be their complements. Let B ′T be
the set of blocks in B ′ that contain T . These blocks are in one-to-one correspondence with
the supports of codewords of weight i which are zero on T , that is, codewords of weight
i in C ⊥ (T ). The number of blocks in B ′T is independent of T as the weight distribution of
C ⊥ (T ) is independent of T . Therefore |B ′T | is independent of T , and B ′ is the set of blocks
in a t-design. Hence B is the set of blocks in a t-design. This proves (ii).
Exercise 450 Show that in the [12, 6, 6] extended ternary Golay code, two codewords,
both of weight 6 or both of weight 9, with the same support must be scalar multiples of
each other.
Exercise 451 Let C be a code over Fq of minimum weight d.
(a) Let c and c′ be two codewords of weight d with supp(c) = supp(c′ ). Show that c = αc′
for some nonzero α in Fq .
(b) Let w be the largest integer with w ≤ n satisfying
w+q −2
w−
< d.
q −1
Show that if c and c′ are two codewords of weight i with d ≤ i ≤ w and supp(c) =
supp(c′ ), then c = αc′ for some nonzero α in Fq .
(c) Let w be defined as in part (b). You are to show that w is the largest integer such that a
codeword of weight w or less in an [n, k, d] code over Fq is determined uniquely up to
scalar multiplication by its support. Do this by finding two vectors in Fqw+1 of weight
w + 1 that generate a [w + 1, 2, d] code over Fq .
305
8.4 The Assmus–Mattson Theorem
Exercise 452 Show that in the notation of the proof of the Assmus–Mattson Theorem,
z−d
t ℓ
λ (z − ℓ).
A′z−t = N (T ) + (q − 1)
ℓ t−ℓ
ℓ=1
Example 8.4.3 Let C be the [24, 12, 8] self-dual extended binary Golay code. C has codewords of weight 0, 8, 12, 16, and 24 only. In the notation of the Assmus–Mattson Theorem,
n = 24 and d = d ⊥ = 8. As C is self-dual, Ai = Ai⊥ . The values of s and t must satisfy
t < 8 and s ≤ 8 − t, where s = |{i | 0 < i ≤ 24 − t and Ai⊥ = 0}|; the highest value of t
that satisfies these conditions is t = 5, in which case s = 3. Therefore the vectors of weight
i in C ⊥ = C hold 5-designs for i = 8, 12, and 16. From the weight distribution of the Golay
code given in Exercise 384, we know that A12 = 2576.
=Hence
the value of λ for the 5-design
24
= 48. In Figure 8.4, we give the
held by the codewords of weight 12 is λ = 2576 12
5
5
Pascal triangle for the 5-(24, 12, 48) design held by the weight 12 vectors in C.
Exercise 453 Verify the entries in Figure 8.4 using the fact that there are 2576 blocks in
the 5-(24, 12, 48) design.
Example 8.4.4 Let C be the [12, 6, 6] self-dual extended ternary Golay code presented in
Section 1.9.2; its weight enumerator is given in Example 7.3.2. C has codewords of weight
0, 6, 9, and 12 only. In the notation of the Assmus–Mattson Theorem, w = 11, and the
highest value of t that satisfies the hypothesis is t = 5; hence s = 1. Thus the codewords
in C of weights 6 and 9 hold 5-designs. These designs have parameters 5-(12, 6, 1) and
5-(12, 9, 35), respectively.
Exercise 454 Verify that the parameters of the 5-designs held by weight 6 and weight
9 codewords of the [12, 6, 6] extended ternary Golay code are 5-(12, 6, 1) and
5-(12, 9, 35).
Exercise 455 Find the Pascal triangles for designs with the following parameters:
(a) 5-(12, 6, 1),
(b) 5-(12, 9, 35).
2576
❅
❅
1288 1288
❅
❅ ❅
❅
616 672 616
❅
❅ ❅
❅ ❅
❅
280 336 336 280
❅
❅ ❅
❅ ❅
❅ ❅
❅
120 160 176 160 120
❅
❅
❅ ❅
❅ ❅
❅ ❅
❅ ❅
48 72 88 88 72 48
Figure 8.4 Pascal triangle of a 5-(24, 12, 48) design.
306
Designs
Exercise 456 In Section 1.10 we introduced the [2m , m + 1] Reed–Muller codes R(1, m).
In Exercise 62 you showed that a generator matrix for R(1, m) can be obtained from a
generator matrix of the [2m − 1, m] simplex code S m by adjoining a column of 0s and
then adding the all-one row to the matrix. The weight distribution of S m is A0 (S m ) = 1
and A2m−1 (S m ) = 2m − 1 by Theorem 2.7.5. Therefore R(1, m) has weight distribution
A0 (R(1, m)) = 1, A2m−1 (R(1, m)) = 2m+1 − 2, and A2m (R(1, m)) = 1. In addition, by
m , which is a [2m , 2m −
Theorem 1.10.1 and Exercise 62 R(1, m)⊥ = R(m − 2, m) = H
m − 1, 4] code.
(a) Prove that the codewords of weight 2m−1 in R(1, m) hold a 3-(2m , 2m−1 , λ) design.
(b) Prove that λ = 2m−2 − 1 in the 3-design of part (a). Hint: Use Theorem 8.1.6.
m hold 3-designs if
(c) Prove that the codewords of weight i with 4 ≤ i ≤ 2m − 4 in H
Ai (Hm ) = 0.
3.
(d) Find the parameter λ of the 3-(8, 4, λ) design held by the words of weight 4 in H
4 . Hint: See Exercise 387.
(e) Find the weight distribution of the [16, 11, 4] code H
(f ) Find the parameters (block size, number of blocks, and λ) of all the 3-designs held by
4.
codewords of fixed weight in H
In Examples 8.3.4 and 8.3.5 we saw how to use designs to compute weight distributions
of codes. We next use designs to compute coset weight distributions.
Example 8.4.5 In Example 1.11.7 we gave the complete coset distribution of the [8, 4, 4]
3 . This distribution is easily obtained by using the
self-dual doubly-even binary code H
intersection numbers for the 3-(8, 4, 1) design given in Figure 8.2. For example, in a coset
of weight 1, the number of weight 3 vectors is the number of blocks of the 3-(8, 4, 1)
design containing a specific point; this value is λ01 = 7. The number of weight 5 vectors is
the number of blocks of the 3-(8, 4, 1) design not containing a specific point; this value
3 contains the all-one vector, the number of weight 3 and
is λ10 = 7. Note that because H
weight 5 vectors must indeed agree. The remaining two vectors in the weight 1 coset must
of course be a single weight 1 vector and a single weight 7 vector. Now consider a coset
of weight 2. The weight 2 vectors in this coset come from adding the coset leader to 0 or
3 whose support contains the support of
adding the coset leader to a weight 4 vector in H
the coset leader. Thus a weight 2 coset has 1 + λ02 = 4 vectors of weight 2. The weight 6
vectors in this coset come from adding the coset leader to the all-one codeword or adding
3 whose support is disjoint from the support of the
the coset leader to a weight 4 vector in H
coset leader. Thus a weight 2 coset has 1 + λ20 = 4 vectors of weight 6. This leaves eight
vectors of weight 4 in the coset. These weight 4 vectors come from a codeword of weight 4
whose support contains precisely one of the coordinates in the support of the coset leader.
As there are two choices for this coordinate, the number of weight 4 vectors in a coset of
weight 2 is 2λ11 = 8, as claimed.
Example 8.4.6 In this example, we compute the weight distribution of a weight 2 coset
of the [24, 12, 8] extended binary Golay code. Let S be a coset of weight 2. This coset
has only even weight vectors. As the code contains the all-one codeword, we know that
Ai (S) = A24−i (S); so we only compute Ai (S) for i ≤ 12. We use Figures 8.3 and 8.4 to do
j
j
this computation. Let λi (8) and λi (12) denote the intersection numbers of the 5-(24, 8, 1)
307
8.4 The Assmus–Mattson Theorem
Table 8.1 Coset distribution of the [24, 12, 8] binary Golay code
Coset
weight
0
1
2
3
4
0
1
2
3
4
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
6
Number of vectors of given weight
5
6
7
8
9
10
0
0
0
21
0
0
0
77
0
64
0
253
0
168
0
759
0
352
0
360
0
506
0
640
0
0
0
946
0
960
11
12
Number
of cosets
0
1288
0
1218
0
2576
0
1344
0
1316
1
24
276
2024
1771
design and the 5-(24, 12, 48) design, respectively. A weight 2 codeword in S is obtained
only by adding the coset leader to 0. So A2 (S) = 1. There can be no weight 4 vectors in
the coset. The weight 6 vectors come from the weight 8 codewords whose supports contain
the support of the coset leader; thus A6 (S) = λ02 (8) = 77. The weight 8 vectors come
from the weight 8 codewords whose supports contain exactly one of the two coordinates
of the support of the coset leader; thus A8 (S) = 2λ11 (8) = 352. The weight 10 vectors
come from the weight 8 codewords whose supports are disjoint from the support of the
coset leader and the weight 12 codewords whose supports contain the support of the coset
leader; thus A10 (S) = λ20 (8) + λ02 (12) = 946. Finally, A12 (S) = 212 − 2(1 + 77 + 352 +
946) = 1344. The weight distributions of the cosets of weights 1, 3, and 4 are calculated in
Exercise 457 with the final results given in Table 8.1. The number of cosets of each weight
was determined in Example 8.3.2.
Exercise 457 Verify that the weight distributions of the weight 1, 3, and 4 cosets of the
[24, 12, 8] extended binary Golay code are as presented in Table 8.1.
The Assmus–Mattson Theorem applied to C is most useful when C ⊥ has only a few
nonzero weights. This occurs for certain self-dual codes over F2 , F3 , and F4 , as we will
examine in more depth in Chapter 9. These are the cases to which the Assmus–Mattson
Theorem has been most often applied. Curiously, the highest value of t for which t-designs
have been produced from this theorem is t = 5 even though t-designs for higher values
of t exist. A great many 5-designs have, in fact, been discovered as a consequence of this
theorem. Designs held by codes may be stronger than predicted by the Assmus–Mattson
Theorem. For example, there is a [22, 12, 8] Hermitian self-dual code over F4 whose weight
8 vectors hold a 2-design when the Assmus–Mattson Theorem says that the weight 8 vectors
hold a 1-design [145].
Table 6.1 of Chapter 6 lists the binary odd-like duadic codes of length n ≤ 119. The
extended codes have the property that they have the same weight distribution as their duals;
such codes are called formally self-dual. From the table, the number of nonzero weights
in the extended codes can be evaluated. Using this information with the Assmus–Mattson
Theorem all codewords of any weight (except 0) in the extended code hold t-designs for the
following cases: n + 1 = 18 and t = 2; n + 1 = 8, 32, 80, or 104 and t = 3; n + 1 = 24
or 48 and t = 5.
308
Designs
We conclude this section with a result that shows that t-designs are held by codewords
of a code C of a fixed weight if the automorphism group ŴAut(C) is t-transitive. Recall that
ŴAut(C) is t-transitive if for every pair of t-element ordered sets of coordinates, there is an
element of the permutation group ŴAutPr (C) that sends the first set to the second set.
Theorem 8.4.7 Let C be a code of length n over Fq where ŴAut(C) is t-transitive. Then the
codewords of any weight i ≥ t of C hold a t-design.
Proof: Let P be the set of coordinates of the code and B the set of supports of the codewords
of weight i. Of all the t-element subsets of P, let T1 = {i 1 , . . . , i t } ⊆ P be one that is
contained in the maximum number λ of blocks. Suppose these distinct blocks are B1 , . . . , Bλ
whose supports are codewords c1 , . . . , cλ , respectively. Let T2 be any other t-element subset
of P. Then there exists an automorphism g of C whose permutation part maps T1 to T2 . The
codewords c1 g, . . . , cλ g have distinct supports; these are blocks of B. The maximality of λ
shows that T2 is in no more than λ blocks. Hence (P, B) is a t-(n, i, λ) design.
8.5
Codes from symmetric 2-designs
In the previous sections we focused on constructing designs from codes. In this section we
reverse the concept: we investigate what can be said about a code over Fq that is generated by
the rows of an incidence matrix of a design. We will be primarily interested in the minimum
weight and dimension of the code. This is one situation where it is often more difficult to find
the dimension of the code than the minimum distance. As an example of what can be said
about the dimension of a code generated by the rows of the incidence matrix of a design, if
the design is a Steiner t-design and the last row of its extended Pascal triangle indicates that
the binary code generated is self-orthogonal, then the code has dimension at most half the
length. For example, the last row of the extended Pascal triangle of the 5-(24, 8, 1) design
in Figure 8.3 indicates that the blocks overlap in an even number of points and hence that
the binary code generated by the incidence matrix of the design, which is the [24, 12, 8]
extended binary Golay code, is self-orthogonal, a fact that we of course already knew. In
the most important cases, the minimum weight is the block size and the minimum weight
codewords have supports that are precisely the blocks.
In this section we will examine the dimension of codes arising from designs and also
look at codewords whose weights are equal to the block size. If A is the incidence matrix of
a t-design (P, B), let C q (A) be the linear code over Fq spanned by the rows of A. C q (A) is
called the code over Fq of the design (P, B). We will focus on t-designs that are symmetric,
and thus the number of blocks b equals the number of points v. In all other cases of nontrivial
designs, b > v, a fact known as Fisher’s inequality; see [204]. By [4], a nontrivial symmetric
t-design does not exist for t > 2. So we will only consider symmetric 2-designs. A great
deal can be said about the dimension of C q (A) in this case, information that has played a
significant role in the study of projective planes as we will see in Section 8.6. In that section
we will also consider the question of the minimum weight of C q (A) and what codewords
have that minimum weight.
309
8.5 Codes from symmetric 2-designs
Before proceeding with our results, a few special properties of 2-(v, k, λ) designs are
required. Let Jv be the v × v matrix all of whose entries are 1 and Iv the v × v identity matrix.
Lemma 8.5.1 Let A be the incidence matrix of a 2-(v, k, λ) design. The following hold:
(i) AT A = (λ1 − λ)Iv + λJv and λ(v − 1) = λ1 (k − 1).
(ii) If the design is symmetric, then:
(a) λ1 = k and λ(v − 1) = k(k − 1),
(b) AT A = A AT = (k − λ)Iv + λJv ,
(c) every pair of blocks intersect in exactly λ points,
1
(d) det(A) = ±k(k − λ) 2 (v−1) , and
(e) if v is even, k − λ is a perfect square.
Proof: The ith diagonal entry of AT A counts the number of blocks containing the ith point;
this is λ1 . The entry in row i and column j = i of AT A is the number of blocks containing
both the ith and jth points; this is λ, proving the first equation in (i). The second equation
follows from Theorem 8.1.3 with t = 2.
Now assume the design is symmetric so that b = λ0 = v. Theorem 8.1.6 and part (i)
yield (a). With t = 2 in Theorem 8.1.3,
λ = λ2 = λ1
k−1
.
v−1
(8.3)
By Exercise 458,
det(AT A) = (k − λ)v−1 (vλ + k − λ).
(8.4)
In particular, AT A is nonsingular as λ < k = λ1 by (8.3). Since A is a square matrix, it
must then be nonsingular. As each row of A has k 1s, A Jv = k Jv . As each column of
A has λ1 1s, Jv A = λ1 Jv . Since λ1 = k, Jv A = A Jv . Hence A commutes with Jv . This
implies that A AT = A AT A A−1 = A((k − λ)Iv + λJv )A−1 = ((k − λ)Iv + λJv )A A−1 and
so A AT = AT A giving (b). The entry in row i and column j = i of A AT is λ; this indicates
that distinct blocks intersect in exactly λ points, which is (c). By (a), (8.3), and (8.4),
det(AT A) = (k − λ)v−1 k 2 .
This yields (d), which implies (e) as A is an integer matrix.
Exercise 458 Prove that
det[(k − λ)Iv + λJv ] = (k − λ)v−1 (vλ + k − λ).
We are now ready to prove the two main theorems of this section that give the dimension
of C q (A) over Fq when A is the incidence matrix of a symmetric 2-design. We shall find it
helpful to consider another code Dq (A) over Fq defined to be the code spanned by the rows
of A , where A is the matrix whose rows are the differences of all pairs of rows of A.
Recall that k − λ is the order of the design. If p is the characteristic of Fq , the dimension can
be bounded above or found exactly depending upon the divisibility of both k − λ and k by p.
Theorem 8.5.2 Let A be the incidence matrix of the symmetric 2-(v, k, λ) design (P, B).
If p is the characteristic of Fq , then the following hold:
310
Designs
(i) Dq (A) is a subcode of C q (A) with codimension at most 1 in C q (A).
(ii) If p | (k − λ), then C q (A) ⊆ Dq (A)⊥ , and Dq (A) is self-orthogonal of dimension at
most v/2.
(iii) If p | (k − λ) and p | k, then C q (A) is self-orthogonal of dimension at most
v/2.
(iv) If p | (k − λ) and p ∤ k, then Dq (A) is of codimension 1 in C q (A), and Dq (A) has
dimension less than v/2.
(v) If p ∤ (k − λ) and p | k, then C q (A) has dimension v − 1.
(vi) If p ∤ (k − λ) and p ∤ k, then C q (A) has dimension v.
Proof: Let r1 , . . . , rv be the rows of A associated to the blocks B1 , . . . , Bv of B.
The supports of the rows of A are the symmetric differences of all pairs of blocks.
Clearly, C q (A) = span{r1 } + Dq (A), and Dq (A) is of codimension at most 1 in C q (A),
giving (i).
We first consider the case p | (k − λ). We have ri · ri ≡ k ≡ λ (mod p), and for i = j,
ri · r j ≡ λ (mod p) by Lemma 8.5.1(ii)(c). Therefore ri · r j ≡ λ (mod p) for all i and
j. So (ri − r j ) · rm ≡ 0 (mod p) for all i, j, and m. Thus Dq (A) is self-orthogonal of
dimension at most v/2, and C q (A) ⊆ Dq (A)⊥ , proving (ii). If in addition p | k, then
p | λ and C q (A) is self-orthogonal giving (iii) as ri · r j ≡ λ ≡ 0 (mod p). If p ∤ k, then
p ∤ λ and r1 · r1 ≡ 0 (mod p) implying that Dq (A) is of codimension 1 in C q (A). As
Dq (A) is properly contained in its dual by (ii), Dq (A) cannot be of dimension v/2,
giving (iv).
Now assume that p ∤ (k − λ). If p | k, then ri · 1 ≡ k ≡ 0 (mod p). Thus C q (A) is of
dimension at most v − 1. Associate column i of A with the point pi ∈ P. Let pi , p j ∈ P
with i = j. There are λ11 blocks in B that contain pi and not p j . Thus the sum s j of all rows of
A associated with the blocks not containing p j has 0 in column j and λ11 in all other columns.
By Corollary 8.2.2, λ11 = λ1 − λ2 = k − λ ≡ 0 (mod p). As span{s j | 1 ≤ j ≤ v} is at least
(v − 1)-dimensional, C q (A) has dimension v − 1 giving (v). If p ∤ k, by Lemma 8.5.1,
det(A) ≡ 0 (mod p), yielding (vi).
From the coding theory standpoint, if p ∤ (k − λ), the code C q (A) is uninteresting. When
p | (k − λ), the code can potentially be worth examining. If in addition p 2 ∤ (k − λ), the
dimension of C q (A) can be exactly determined.
Theorem 8.5.3 Let A be the incidence matrix of the symmetric 2-(v, k, λ) design (P, B).
Let p be the characteristic of Fq and assume that p | (k − λ), but p 2 ∤ (k − λ). Then v is
odd, and the following hold:
(i) If p | k, then C q (A) is self-orthogonal and has dimension (v−1)/2.
(ii) If p ∤ k, then Dq (A) = C q (A)⊥ ⊂ C q (A) = Dq (A)⊥ . Furthermore, C q (A) has dimension (v + 1)/2 and Dq (A) is self-orthogonal.
Proof: By Lemma 8.5.1(ii)(e), if v is even, k − λ must be a perfect square contradicting
p | (k − λ) but p 2 ∤ (k − λ). So v is odd.
View A as an integer matrix. Adding columns 1, 2, . . . , v − 1 of A to column v gives a
matrix A1 with ks in column v because each row of A has k 1s. Subtracting row v of A1
311
8.5 Codes from symmetric 2-designs
from each of the previous rows gives a matrix A2 with
0
0
.. ,
A2 =
.
H
0
a1 · · · av−1
k
where H is a (v − 1) × (v − 1) integer matrix and each ai is an integer. Furthermore adding
a row, or column, of a matrix to another row, or column, does not affect the determinant.
Hence det(A) = k det(H ). By the theory of invariant factors of integer matrices (see for
example, [187, Appendix C]), there are (v − 1) × (v − 1) integer matrices U and V of
determinant ±1 such that U H V = diag(h 1 , . . . , h v−1 ), where h 1 , . . . , h v−1 are integers
satisfying h i | h i+1 for 1 ≤ i < v − 1. Hence,
1
det(A) = k det(H ) = kh 1 h 2 · · · h v−1 = ±k(k − λ) 2 (v−1) .
Furthermore, the rank r of A over Fq , which is the dimension of C q (A), is the number
of values among {k, h 1 , . . . , h v−1 } that are not zero modulo p. As p | (k − λ) but p 2 ∤
(k − λ), at most (v − 1)/2 of the h i s are 0 modulo p. Hence the dimension of C q (A) is
at least (v − 1)/2 if p | k and at least (v − 1)/2 + 1 = (v + 1)/2 if p ∤ k. Thus if p | k,
by Theorem 8.5.2(iii), C q (A) has dimension (v − 1)/2 as v is odd, giving (i). If p ∤ k,
Dq (A) has codimension 1 in C q (A) and dimension less than v/2 by Theorem 8.5.2(iv),
implying that C q (A) must have dimension (v + 1)/2. This gives part of (ii), with the rest
now following from Theorem 8.5.2.
The condition that p | (k − λ) but p 2 ∤ (k − λ) is crucial to the theorem, as can be seen
in the proof. When p 2 | (k − λ), the code C q (A) may have smaller dimension, as we will
discover in the next section.
Exercise 459 This exercise provides an alternate proof of Theorem 8.5.3(ii) that does not
rely on invariant factors. The dimension r of C q (A) is the rank of A over Fq , which is
the rank of A over F p as A has integer entries. Hence r is also the dimension of C p (A).
Assume that the points of P have been ordered so that the right-most r coordinates form
an information set for C p (A).
(a) Explain why there exists a (v − r ) × r matrix B with entries in F p such that [Iv−r B]
is a parity check matrix for C p (A).
(b) Let M be the v × v matrix
M=
Iv−r
O
B
Ir
viewed as an integer matrix. Explain why the integer matrix A1 = M AT satisfies
det(A1 ) = det(A) and why each entry in the first v − r rows of A1 is a multiple of p.
(c) Prove that p v−r | det(A).
(d) Prove that as p | (k − λ), p 2 ∤ (k − λ), and p ∤ k, then v − r ≤ (v − 1)/2 and the dimension of C q (A) is (v + 1)/2.
312
Designs
We know that codewords in C q (A) whose supports are blocks have weight k. A natural
question is to ask whether codewords of weight k exist whose supports are not blocks; this
becomes even more interesting when k is the minimum weight of the code. We begin our
investigation of this question with a lemma found in [187].
Lemma 8.5.4 Let (P, B) be a symmetric 2-(v, k, λ) design. Let S ⊂ P with |S| = k.
Suppose that S meets all blocks of B in at least λ points. Then S is a block.
Proof: For all B ∈ B, let n B = |S ∩ B| − λ; by assumption n B is a nonnegative integer.
Also,
B∈B
nB =
B∈B
|S ∩ B| − vλ.
(8.5)
Let X = {(x, B) | B ∈ B, x ∈ S ∩ B}. If x ∈ S, x is in k blocks B by Lemma 8.5.1.
Thus |X | = k 2 . But for every block B ∈ B, there are |S ∩ B| points x ∈ S ∩ B. So |X | =
2
B∈B |S ∩ B|. Thus
B∈B |S ∩ B| = k implying from (8.5) and Lemma 8.5.1 that
B∈B
n B = k − λ.
(8.6)
Let Y = {(x, y, B) | B ∈ B, x, y ∈ S ∩ B, x = y}. There are k(k − 1) ordered pairs of
elements in S each in exactly λ blocks. Thus |Y| = k(k − 1)λ. But for every block B ∈ B,
there are |S ∩ B|(|S ∩ B| − 1) = (n B + λ)(n B + λ − 1) pairs of points in S ∩ B. Thus
B∈B
(n B + λ)(n B + λ − 1) = k(k − 1)λ.
Expanding the left-hand side and using (8.6), we obtain (see Exercise 460)
B∈B
n 2B = (k − λ)2 .
(8.7)
As the n B are nonnegative integers, by (8.6) and (8.7), the only possibility is n B = 0 for all
but one B and n B = k − λ for this one B; but then |S ∩ B| = k, implying S = B.
Exercise 460 In the proof of Lemma 8.5.4, verify (8.7).
We now show that certain codewords of weight k in the code C q (A), where A is the
incidence matrix of a symmetric 2-(v, k, λ) design (P, B), have supports that are blocks
of the design. If S ⊂ P, let c S denote the vector in Fqv whose nonzero components are all 1
and whose support is S.
Theorem 8.5.5 Let (P, B) be a symmetric 2-(v, k, λ) design with incidence matrix A. Let
p > λ be a prime dividing k − λ. Suppose Fq has characteristic p. Finally, let S ⊂ P have
size k. If c S ∈ C q (A), then S is a block in B.
Proof: As c S ∈ C q (A) and C q (A) is generated by {c B | B ∈ B}, c S = B∈B a B c B for a B ∈
Fq . Let B ′ be a block in B. Then
cS · c B′ =
B∈B
aB cB · cB′ .
(8.8)
313
Y
L
8.5 Codes from symmetric 2-designs
F
For B ∈ B, as the only entries in c B and c B ′ are 0s and 1s, c B · c B ′ is actually in F p ⊆ Fq and
clearly equals |B ∩ B ′ | modulo p. If B = B ′ , |B ∩ B ′ | = λ by Lemma 8.5.1; if B = B ′ ,
|B ∩ B ′ | = k. As p | (k − λ), k ≡ λ (mod p). Thus |B ∩ B ′ | ≡ k (mod p) for all B ∈ B
and the right-hand side of (8.8) becomes
T
m
a
e
B∈B
aB cB · cB′ =
ka B ,
B∈B
implying
cS · c B′ =
ka B .
(8.9)
B∈B
The codeword c S is obtained by multiplying each row of A corresponding to B by a B thus
creating a matrix A′ and adding the rows of A′ in Fq . The sum of all the entries in A′ is
B∈B ka B as the row corresponding to B is the sum of a B taken k times. This must be the
sum of the entries in c S in Fq . But the sum of the entries in c S is the sum of k 1s in Fq . As
above, c S · c B ′ is again in F p ⊆ Fq and equals |S ∩ B ′ | modulo p. Hence (8.9) becomes
|S ∩ B ′ | ≡ k ≡ λ (mod p).
As p > λ, we must have |S ∩ B ′ | ≥ λ for all B ′ ∈ B. The result follows from
Lemma 8.5.4.
This theorem shows that if p > λ, the codewords in C q (A) of weight k that are multiples
of binary vectors must have supports that are blocks. There are examples of designs showing
that you cannot drop either the condition p > λ or the condition that the codeword of weight
k must be constant on its support.
Example 8.5.6 There are three inequivalent symmetric 2-(16, 6, 2) designs. If A is the
incidence matrix of one of these designs, then C 2 (A) has dimension 6, 7, or 8. If we let Ai
be the incidence matrix where C i = C 2 (Ai ) has dimension i, the weight enumerators are
given in [302]:
WC6 (x, y) = y 16 + 16x 6 y 10 + 30x 8 y 8 + 16x 10 y 6 + x 16 ,
WC7 (x, y) = y 16 + 4x 4 y 12 + 32x 6 y 10 + 54x 8 y 8 + 32x 10 y 6 + 4x 12 y 4 + x 16 ,
WC8 (x, y) = y 16 + 12x 4 y 12 + 64x 6 y 10 + 102x 8 y 8 + 64x 10 y 6 + 12x 12 y 4 + x 16 .
Notice that the 16 blocks are precisely the supports of the weight 6 codewords only in the
first code. In the other cases there are codewords of weight 6 whose supports are not blocks.
Thus the condition p > λ in Theorem 8.5.5 is necessary. Notice that in the last two cases
the minimum weight is less than the block size.
Example 8.5.7 In Exercise 471, you are asked to construct an (11, 5, 2) cyclic difference
set that will yield a symmetric 2-(11, 5, 2) design D. Let A be its incidence matrix. By
Theorem 8.5.3(ii), C 3 (A) is an [11, 6] code. It turns out that the code has minimum weight
5 and that there are codewords of weight 5 whose supports are not blocks of the original
design; these are codewords with nonzero components a mixture of both 1 and 2. Thus the
condition that the codeword of weight k must be constant on its support is also necessary
in Theorem 8.5.5.
314
Designs
There are other designs related to symmetric 2-designs that have a strong relationship to
codes. For example, a quasi-symmetric design is a 2-design in which every pair of distinct
blocks intersects in either α or β points. Whether or not there exists a quasi-symmetric
2-(49, 9, 6) design with distinct blocks intersecting in 1 or 3 points was an open question
until 1995. If such a design exists, it has 196 blocks by Theorem 8.1.6. Its existence is
demonstrated as follows. One possible weight enumerator of a binary self-dual [50, 25, 10]
code C is WC (x, y) = y 50 + 196x 10 y 40 + 11368x 12 y 38 + · · · . If a code with this weight
enumerator exists, the supports of all 196 minimum weight codewords amazingly must
have one coordinate in common. Deleting this common coordinate from these supports
produces a quasi-symmetric 2-(49, 9, 6) design with distinct blocks intersecting in 1 or 3
points. In [151], four inequivalent codes of this type are given, establishing the existence of
at least four quasi-symmetric designs. Further discussion on quasi-symmetric designs and
other connections between codes and designs can be found in [332].
We conclude this section with a result found in [13] that applies to codes holding 2designs that may not be symmetric. We include it here because its proof is reminiscent of
other proofs in this section.
Theorem 8.5.8 Let C be a self-orthogonal [n, k, d] code over Fq with d ≥ 2. Suppose that
the minimum weight codewords of C hold a 2-design. Then
d ≥1+
√
n − 1.
Proof: Let B be a block in the 2-(n, d, λ) design (P, B) held by the minimum weight
codewords of C. Suppose B has b blocks. Let n i be the number of blocks in B, excluding
B, that meet B in i points. Form a (b − 1) × d matrix M with columns indexed by the d
points of B and rows indexed by the b − 1 blocks of B excluding B. The entry in row j
and column k is 1 provided the kth point of B is in the block corresponding to the jth row
of M. All other entries of M are 0.
We verify the following two equations:
d−1
i=0
in i = d(λ1 − 1) and
(8.10)
d−1
i=0
i(i − 1)n i = d(d − 1)(λ − 1).
(8.11)
d−1
in i . As every
In M there are n i rows with i 1s. Thus the total number of 1s in M is i=0
point of the design is in λ1 blocks, there are λ1 − 1 1s in every column of M and hence a
total of d(λ1 −1) 1s in M. Equating these two counts gives (8.10). In a row of
M with i
d−1 i
n
1s, there are 2i pairs of 1s. Thus in the rows of M there are a total of i=0
i pairs of
2
1s. As every pair of points is in λ blocks, fixing any pair of columns of M, there are λ − 1
rows of M that contain 1s in these two columns. Thus there are a total of ( d2 )(λ − 1) pairs
of 1s in the rows of M. Equating these two counts and doubling the result gives (8.11).
As C is self-orthogonal, no pair of blocks of B can intersect in exactly one point.
Hence n 1 = 0. Thus i(i − 1)n i ≥ in i for all i. Using this with (8.10) and (8.11) yields
315
8.6 Projective planes
(d − 1)(λ − 1) ≥ (λ1 − 1). As λ1 (d − 1) = (n − 1)λ by Theorem 8.5.1(i),
(d − 1)2 (λ − 1) ≥ (λ1 − 1)(d − 1) = (n − 1)λ − (d − 1).
So (d 2 − 2d + 2 − n)λ ≥ (d − 1)(d − 2) ≥ 0, implying d 2 − 2d + 2 − n ≥ 0. Solving
this inequality yields the result.
Example 8.5.9 We use this result to find a lower bound on the minimum weight of certain
codes.
r Let G be the extended binary Golay code. This code is self-dual and doubly-even. In
24
Section 10.1.2 we will see that PAut(G 24 ) is 5-transitive. In particular, PAut(G 24 ) is 2transitive and, by Theorem 8.4.7, the minimum weight codewords
of G 24 hold a 2-design.
√
Theorem 8.5.8 shows the minimum weight is at least 1 + 23, that is, at least 6. Since
G 24 is doubly-even, the minimum weight is at least 8.
r Let G be the extended self-dual ternary Golay code. In Section 10.4.2 we show that
12
MAut(G 12 ) is 5-transitive and so 2-transitive. By Theorem 8.4.7, the minimum weight
codewords of
√ G 12 hold a 2-design, and Theorem 8.5.8 shows the minimum weight is
at least 1 + 11. Thus the minimum weight is at least 5. As all codewords of ternary
self-dual codes have weights a multiple of 3, the minimum weight is at least 6.
r Let C be a self-orthogonal extended primitive BCH code of length q m , where q is
a prime power. By Theorem 5.1.9, C is affine-invariant and hence PAut(C) contains
GA1 (q m ), which is 2-transitive. Hence Theorem 8.4.7 shows the minimum weight
codewords hold a 2-design, and Theorem 8.5.8 shows the minimum weight is at least
√
1 + q m − 1.
8.6
Projective planes
We now apply the results of the previous section to the specific case of projective planes.
In Section 8.1, we indicated that a symmetric 2-(v, k, 1) design is precisely a projective plane as defined in Section 6.5; the lines of the plane are the blocks of the design. The order µ of a symmetric 2-(v, k, 1) design is k − 1. (In many books the order
of a plane is denoted by n; however, because we so often use n to denote the length
of a code, we choose the symbol µ to denote the order of a plane.) By Theorem 8.1.6,
v = µ2 + µ + 1. Thus a projective plane is a symmetric 2-(µ2 + µ + 1, µ + 1, 1) design. However, a 2-(µ2 + µ + 1, µ + 1, 1) design is automatically symmetric because
Theorem 8.1.6 shows that such a design has µ2 + µ + 1 blocks. We now prove this basic
equivalence.
Theorem 8.6.1 A set of points and lines is a finite projective plane if and only if it is a
symmetric 2-(v, k, 1) design with v ≥ 4.
Proof: Let P and L denote the points and lines of a finite projective plane. We show that
(P, B) is a symmetric 2-(v, k, 1) design where B = L and v is the number of points P. We
first must show that every line has the same number of points. Let ℓ be a line with k points
where k is as large as possible. Let m be any other line. Pick any point P not on either ℓ
316
Designs
or m; such a point exists because the plane contains at least four points no three of which
are on the same line. There are k distinct lines through P and the k points on ℓ. Each of
these k lines intersects m in distinct points, since two points determine a unique line. Thus
m contains at least k points. By maximality of k, m had exactly k points. So all lines have
k points. As every pair of distinct points determines a unique line, (P, B) is a 2-(v, k, 1)
design. We need to show that it is symmetric. Let b be the number of lines and let
X = {(P, ℓ) | P ∈ P, ℓ ∈ L, P ∈ ℓ}.
As every line contains k points, |X | = bk. If P is a point, then there are k lines through P
as follows. Let m be a line not containing P. Each of its k points determines a unique line
through P and hence there are at least k lines through P. Any line through P must intersect
m in a unique point, with different lines intersecting at different points, and so there are at
most k lines through P. Thus there are k lines through P. Hence |X | = vk; equating the
two counts for |X | gives b = v. Thus the design is symmetric; as a projective plane has at
least four points, v ≥ 4.
Now assume (P, B) is a symmetric 2-(v, k, 1) design. Then every pair of points is in
a unique block, and by Lemma 8.5.1, two distinct blocks intersect in a unique point. As
b = v ≥ 4, there are at least four points, no three of which are in the same block. Hence
(P, B) is a projective plane.
From this proof, we now have some basic properties of projective planes.
Theorem 8.6.2 A projective plane of order µ satisfies the following properties:
(i) The plane has µ2 + µ + 1 points and µ2 + µ + 1 lines.
(ii) Distinct points determine a unique line and distinct lines intersect in a unique point.
(iii) Every line contains exactly µ + 1 points and every point is on exactly µ + 1 lines.
We can determine the minimum distance, find the minimum weight codewords, and either
bound or find exactly the dimension of C 2 (A) when A is the incidence matrix of a projective
plane of order µ.
Theorem 8.6.3 Let A be the incidence matrix of a projective plane of order µ arising from
the symmetric 2-(v, k, 1) design (P, B).
(i) If µ = k − 1 is odd, then C 2 (A) is the [v, v − 1, 2] code consisting of all even weight
codewords in Fv2 .
(ii) If µ = k − 1 is even, then the extended code
C 2 (A) of C 2 (A) is self-orthogonal of
dimension at most (v + 1)/2. The minimum weight of C 2 (A) is µ + 1 and the lines of
(P, B) are precisely the supports of the vectors of this weight.
Proof: Suppose that µ = k − 1 is odd. Then 2 ∤ µ and 2 | k. By Theorem 8.5.2, C 2 (A) has
dimension v − 1. As all blocks have an even number k of points, C 2 (A) must be an even
code. Therefore (i) holds.
Now suppose that µ = k − 1 is even. Then as the blocks have odd weight and distinct
blocks meet in one point, the extended code
C 2 (A) is generated by a set of codewords of
even weight whose supports agree on two coordinates, one being the extended coordinate.
Thus
C 2 (A) is self-orthogonal and hence has dimension at most (v + 1)/2. Also as
C 2 (A) is
317
8.6 Projective planes
self-orthogonal, two even weight vectors in C 2 (A) meet in an even number of coordinates
while two odd weight vectors in C 2 (A) meet in an odd number of coordinates. We only need
to find the minimum weight codewords of C 2 (A).
Suppose x is a nonzero vector in C 2 (A) of even weight where supp(x) contains the point
P. Let ℓ be any of the µ + 1 lines through P. As x has even weight, supp(x) must intersect
ℓ in an even number of points. Thus ℓ meets supp(x) at P and another point Q ℓ . Since
two points determine a unique line, the µ + 1 lines ℓ determine µ + 1 different points Q ℓ
showing that wt(x) ≥ µ + 2.
Suppose now that x is a vector in C 2 (A) of odd weight. If wt(x) = 1, then there is a
codeword c whose support is a line containing supp(x); hence x + c is a codeword of
even weight µ, a contradiction to the above. So assume supp(x) contains points P and Q.
Now supp(x) must meet each line in an odd number of points. Assume supp(x) is not the
unique line ℓ containing P and Q. Then supp(x) must meet ℓ in at least one additional
point R. If ℓ ⊆ supp(x), then wt(x) ≥ (µ + 1) + 2 as both µ + 1 and wt(x) are odd. If
ℓ ⊆ supp(x), then there is a point S on ℓ with S not in supp(x). The µ lines through S,
excluding ℓ, meet supp(x) in at least µ distinct points none of which are P, Q, or R. Thus
if supp(x) = ℓ, wt(x) ≥ µ + 3 completing part (ii).
We remark that part (ii) of this theorem is consistent with Theorem 8.5.5; since p =
2 > λ = 1, the codewords of weight k in C 2 (A) are precisely those whose supports are the
blocks. Theorem 8.6.3 indicates that the most interesting codes arise when µ is even. If
2 | µ but 4 ∤ µ, we can give even more information about the code C 2 (A).
Theorem 8.6.4 Let (P, B) be a projective plane of order µ ≡ 2 (mod 4) with incidence
matrix A. Then the following hold:
(i) C 2 (A) is a [v, (v + 1)/2, µ + 1] code, where v = µ2 + µ + 1. Furthermore, C 2 (A)
contains the all-one vector.
(ii) The code D2 (A) is the subcode of C 2 (A) consisting of all its even weight vectors,
D2 (A) = C 2 (A)⊥ , D2 (A) is doubly-even, and C 2 (A) is generated by D2 (A) and the
all-one vector.
(iii) The extended code
C 2 (A) of C 2 (A) is a self-dual doubly-even [v + 1, (v + 1)/2,
µ + 2] code, and all codewords of C 2 (A) have weights congruent to 0 or 3 modulo 4.
Proof: By Theorem 8.6.2(i), v = µ2 + µ + 1. As 2 | µ but 22 ∤ µ, by Theorem 8.5.3(ii),
the dimension of C 2 (A) is (v + 1)/2. The minimum weight is µ + 1 by Theorem 8.6.3.
The sum of all the rows of A is the all-one vector because every point is on µ + 1 lines by
Theorem 8.6.2(iii), completing the proof of (i).
The code D2 (A) is generated by vectors that are the difference of rows of A and hence
have weight 2(µ + 1) − 2 ≡ 0 (mod 4) as distinct lines have µ + 1 points with exactly one
point in common. As D2 (A) = C 2 (A)⊥ ⊂ C 2 (A) = D2 (A)⊥ by Theorem 8.5.3(ii), D2 (A)
is self-orthogonal and hence doubly-even. The code D2 (A) is contained in the even weight
subcode of C 2 (A). But both D2 (A) and the even weight subcode of C 2 (A) have codimension
1 in C 2 (A) by Theorem 8.5.2(iv) using the fact that the rows of A have odd weight µ + 1.
This verifies (ii).
318
Designs
Part (iii) follows from parts (i) and (ii) with the observation that the all-one vector in C 2 (A) extends to the all-one vector of length µ2 + µ + 2 ≡ 0 (mod 4) and codewords in C 2 (A) of minimum weight extend to codewords of weight µ + 2 ≡ 0 (mod 4) in
C 2 (A).
When 4 | µ, the code
C 2 (A) is still self-orthogonal of minimum weight µ + 2 by Theorem 8.6.3; however, it may not be self-dual. We consider this case later.
We now examine sets of points in a projective plane no three of which are collinear.
If O is such a set of points, fix a point P on O. Then P together with every other point
in O determines different lines through P. Therefore O can have at most µ + 2 points as
Theorem 8.6.2 shows that every point has µ + 1 lines through it. Now assume O has exactly
µ + 2 points. Then every line through P must intersect O at exactly one other point. Any
line that intersects O does so at some point P and hence meets O at another point. Thus
every line either meets O in zero or two points. Now choose a point Q not on O. Then
any line through Q meeting O meets it in another point. All lines through Q meeting O
partition O into pairs of points that determine these lines implying that µ is even. So if O
has µ + 2 points, then µ is even.
An oval of a projective plane of order µ is a set of µ + 2 points, if µ is even, or a set
of µ + 1 points, if µ is odd, which does not contain three collinear points. So ovals are the
largest sets possible in a projective plane with no three points collinear. There is no guarantee
that ovals exist. The interesting case for us is when µ is even. Our above discussion proves
the following.
Theorem 8.6.5 Any line intersects an oval in a projective plane of even order in either zero
or two points.
If ovals exist in a projective plane of order µ ≡ 2 (mod 4), then the ovals play a special
role in C 2 (A).
Theorem 8.6.6 Let (P, B) be a projective plane of even order µ with incidence matrix A.
Then the ovals of (P, B) are precisely the supports of the codewords of weight µ + 2 in
C 2 (A)⊥ . Furthermore, the minimum weight of C 2 (A)⊥ is at least µ + 2.
Proof: Suppose first that x ∈ C 2 (A)⊥ has weight µ + 2. Then every line must be either
disjoint from supp(x) or meet it in an even number of points as every line is the support of a
codeword in C 2 (A). Suppose P is a point in supp(x). Every one of the µ + 1 lines through
P must intersect supp(x) in at least one other point; P and these µ + 1 points account for
all of the points of supp(x). Hence no line meets supp(x) in more than two points and so
supp(x) is an oval.
For the converse, suppose that x is an oval in (P, B). Let x be the binary vector with
x = supp(x). As x meets any line in an even number of points by Theorem 8.6.5, x is
orthogonal to any codeword whose support is a line. As such codewords generate C 2 (A),
x ∈ C 2 (A)⊥ .
The fact that the minimum weight of C 2 (A)⊥ is at least µ + 2 is left as Exercise
461.
Exercise 461 Let (P, B) be a projective plane of even order µ with incidence matrix A.
Prove that the minimum weight of C 2 (A)⊥ is at least µ + 2.
319
8.6 Projective planes
Corollary 8.6.7 Let (P, B) be a projective plane of order µ ≡ 2 (mod 4) with incidence
matrix A. Then the ovals of (P, B) are precisely the supports of the codewords of weight
µ + 2 in C 2 (A).
Proof: By Theorem 8.6.4, C 2 (A)⊥ is the subcode of even weight vectors in C 2 (A). The
result follows from Theorem 8.6.6.
We have settled the question of the dimension of C 2 (A) for a projective plane of order
µ except when 4 | µ. If 4 | µ,
C 2 (A) is self-orthogonal and C 2 (A) has minimum weight
µ + 1 by Theorem 8.6.3; however, we do not know the dimension of C 2 (A). A special case
of this was considered by K. J. C. Smith [316], where he examined designs arising from
certain projective geometries. The next theorem, which we present without proof, gives a
part of Smith’s result. The projective plane denoted PG(2, 2s ) is the plane whose points are
the 1-dimensional subspaces of Fq3 , where q = 2s , and whose lines are the 2-dimensional
subspaces of Fq3 .
Theorem 8.6.8 If (P, B) is the projective plane PG(2, 2s ), then C 2 (A) has dimension
3s + 1.
If µ is a power of any prime, then PG(2, µ) yields a projective plane in the same fashion
that PG(2, 2s ) did above. The next theorem presents a special case of a result by F. J.
MacWilliams and H. B. Mann [216] that gives the dimension of certain C p (A) when p is
an odd prime.
Theorem 8.6.9 If (P, B) is the projective plane PG(2, p s ) where p is an odd prime, then
C p (A) has dimension
p+1 s
+ 1.
2
Combinatorists have long been fascinated with the problem of finding values µ for which
there is a projective plane of that order or proving that no such plane can exist. Since PG(2, µ)
is a projective plane of order µ when µ is a power of a prime, projective planes exist for all
prime power orders. There are three projective planes of order 9 that are not equivalent to
PG(2, 9) or each other; see [182]. However, at the present time, there is no known projective
plane of any order other than a prime power. In fact it is conjectured that projective planes
of nonprime power orders do not exist.
Research Problem 8.6.10 Find a projective plane of order that is not a prime power or
show that no such plane exists.
The most useful result that allows one to show that a symmetric 2-(v, k, λ) design cannot
exist is the Bruck–Ryser–Chowla Theorem.
Theorem 8.6.11 (Bruck–Ryser–Chowla) If there exists a symmetric 2-(v, k, λ) design
where µ = k − λ, then either:
(i) v is even and µ is a square, or
(ii) v is odd and z 2 = µx 2 + (−1)(v−1)/2 λy 2 has an integer solution with not all of x, y,
and z equal to 0.
320
Designs
Table 8.2 Parameters of binary codes from projective
planes
µ
2
4
8
12†
C 2 (A)
[8, 4, 4]
[22, 10, 6]
[74, 28, 10]
[158, ≤ 79, 14]
µ
16∗
18†
20†
24†
C 2 (A)
[274, 82, 18]
[344, 172, 20]
[422, ≤ 211, 22]
[602, ≤ 301, 26]
Note that (i) of the Bruck–Ryser–Chowla Theorem is Lemma 8.5.1(ii)(e). The following
is a consequence of the Bruck–Ryser–Chowla Theorem [187]. The notation in this theorem
is as follows. If i is a positive integer, let i ∗ be defined as the square-free part of i; that is,
i = i ∗ j, where j is a perfect square and i ∗ is 1 or the product of distinct primes.
Theorem 8.6.12 Suppose that there exists a symmetric 2-(v, k, λ) design with order µ =
k − λ. Let p | µ where p is an odd prime.
(i) If p ∤ µ∗ and p | λ∗ , then µ is a square modulo p.
(ii) If p | µ∗ and p ∤ λ∗ , then (−1)(v−1)/2 λ∗ is a square modulo p.
(iii) If p | µ∗ and p | λ∗ , then (−1)(v+1)/2 (λ∗ / p)(µ∗ / p) is a square modulo p.
As a corollary, we can prove the following result about the existence of projective planes.
Corollary 8.6.13 Suppose that there exists a projective plane of order µ where
µ ≡ 1 (mod 4) or µ ≡ 2 (mod 4). If p is an odd prime with p | µ∗ , then p ≡ 1
(mod 4).
Proof: The number of points v of the plane is v = µ2 + µ + 1. Hence (v − 1)/2 is odd
when µ is as stated. Thus by Theorem 8.6.12(ii), −1 is a square modulo p. By Lemma 6.2.4,
p ≡ 1 (mod 4).
Exercise 462 Do the following:
(a) Show that a projective plane of order 6 does not exist.
(b) Show that a projective plane of order 14 does not exist.
(c) Show that Corollary 8.6.13 does not settle the question of the existence of a projective
plane of order 10.
(d) For what values of µ, with µ < 100, does Corollary 8.6.13 show that a projective plane
of order µ does not exist?
Table 8.2 gives the length, dimension, and minimum distance of
C 2 (A), where A is the
incidence matrix of a projective plane of even order µ ≤ 24. If µ is odd, the code C 2 (A)
is the [µ2 + µ + 1, µ2 + µ, 2] code of all even weight vectors by Theorem 8.6.3. As
mentioned earlier, planes of prime power order exist. The planes of orders 6, 14, and 22
do not exist by the Bruck–Ryser–Chowla Theorem (see Exercise 462). The smallest order
not settled by Corollary 8.6.13 is 10. A monumental effort was made to construct a plane
of order 10, and eventually it was shown that no such plane exists; we discuss this in
Section 8.8. Planes of orders 12, 18, 20, and 24 may or may not exist; in the table these
321
8.7 Cyclic projective planes
entries are marked with a† . We note that the projective planes of order 8 or less are unique
up to equivalence [4], and there are four inequivalent projective planes of order 9 [182]. For
given µ in the table, the minimum distance of
C 2 (A) is µ + 2 and the dimension is at most
(µ2 + µ + 2)/2 by Theorem 8.6.3(ii). When µ = 2 or 18, the dimension equals (µ2 + µ +
2)/2 by Theorem 8.6.4(iii). For µ = 4 or 8, the unique projective plane is PG(2, µ), and the
dimension of
C 2 (A) is given by Theorem 8.6.8. If µ = 16, the only plane included in the
table is PG(2, 16), and for that reason the value µ = 16 is marked with a∗ ; the dimension
of the associated code is again given by Theorem 8.6.8.
8.7
Cyclic projective planes
In this section we discuss cyclic projective planes and explore the relationship of difference
sets to cyclic planes. We will use the theory of duadic codes to show the nonexistence of
certain cyclic projective planes.
Recall that a projective plane on v points is cyclic if there is a v-cycle in its automorphism
group. For example, the projective plane PG(2, q) of order µ = q, where q is a prime power,
has an automorphism which is a v-cycle where v = µ2 + µ + 1; this is called a Singer cycle
(see [4]). In general, a cyclic design is one that has a v-cycle in its automorphism group. By
renaming the points, we may assume that the points are the cyclic group Zv under addition
modulo v and that the cyclic automorphism is the map i → i + 1 (mod v). We make this
assumption whenever we speak of cyclic designs. We begin with a theorem relating cyclic
designs to duadic codes that will prove quite useful when we look at cyclic projective planes.
Theorem 8.7.1 Let A be the incidence matrix of a cyclic symmetric 2-(v, k, λ) design
(P, B). Let p be the characteristic of Fq and assume that p | (k − λ) but p 2 ∤ (k − λ).
Then the following hold:
(i) If p | k, then C q (A) is a self-orthogonal even-like duadic code of dimension (v − 1)/2
whose splitting is given by µ−1 .
(ii) If p ∤ k, then Dq (A) is a self-orthogonal even-like duadic code of dimension (v − 1)/2
whose splitting is given by µ−1 . Also Dq (A)⊥ = C q (A).
Proof: As the rows of A are permuted by the map i → i + 1 (mod v), C q (A) must be a
cyclic code. As differences of rows of A are also permuted by this map, Dq (A) is also a
cyclic code. By Theorem 8.5.3 the dimensions and self-orthogonality of C q (A) in (i) and
Dq (A) in (ii) are as claimed. The remainder of the result follows from Theorems 6.4.1 and
8.5.3.
In this section we will need the following theorem comparing the action of a design
automorphism on blocks and on points.
Theorem 8.7.2 An automorphism of a symmetric 2-design fixes the same number of blocks
as it fixes points.
Proof: Let A be the incidence matrix of a symmetric 2-design D = (P, B) and let τ ∈
Aut(D). Then there are permutation matrices P and Q such that P AQ = A, where P gives
322
Designs
the action of τ on the blocks B, and Q gives the action of τ on the points P. By Lemma 8.5.1,
A is nonsingular over the rational numbers. Therefore AQ A−1 = P −1 = P T , and Q and
P have the same trace. As the number of fixed points of a permutation matrix is the trace
of the matrix, the result follows.
Corollary 8.7.3 An automorphism of a symmetric 2-design has the same cycle structure
on blocks as it does on points.
Proof: If σ and τ represent the same automorphism acting on blocks and points, respectively, then so do σ i and τ i . Let n i , respectively n i′ , be the number of i-cycles of σ , respectively τ . If i is a prime p, the number of fixed points of σ i , respectively τ i , is p · n p + n 1 ,
respectively p · n ′p + n ′1 . As n 1 = n ′1 by Theorem 8.7.2 applied to σ and τ , and since
p · n p + n 1 = p · n ′p + n ′1 by the same theorem applied to σ i and τ i , n p = n ′p . We can
continue inductively according to the number of prime factors of i.
Exercise 463 Fill in the details of the proof of Corollary 8.7.3.
We are now ready to introduce the concept of a difference set; this will be intimately
related to cyclic designs. Let G be an abelian group of order v. A (v, k, λ) difference set in G
is a set D of k distinct elements of G such that the multiset {x − y | x ∈ D, y ∈ D, x = y}
contains every nonzero element of G exactly λ times. There are k(k − 1) differences between
distinct elements of D, and these differences make up all nonzero elements of G exactly λ
times. Thus
k(k − 1) = (v − 1)λ.
(8.12)
We will be most interested in the case where G is the cyclic group Zv , in which case
the difference set is a cyclic difference set. In Theorem 8.7.5 we will see the connection
between difference sets and symmetric designs.
Example 8.7.4 In Z7 , the set D = {1, 2, 4} is a (7, 3, 1) cyclic difference set as one can
see by direct computation (Exercise 464). By adding each element of i ∈ Z7 to the set D
we obtain the sets Di = {i + 1, i + 2, i + 4}. These form the blocks of a projective plane
D of order 2. The map i → i + 1 (mod 7) is a 7-cycle in Aut(D).
Exercise 464 Prove that in Example 8.7.4:
(a) D = {1, 2, 4} is a (7, 3, 1) cyclic difference set,
(b) the sets Di = {i + 1, i + 2, i + 4} are the blocks of a projective plane D of order 2,
and
(c) the map i → i + 1 (mod 7) is a 7-cycle in Aut(D).
Exercise 465 Prove that:
(a) {1, 3, 4, 5, 9} and {2, 6, 7, 8, 10} are (11, 5, 2) cyclic difference sets, and
(b) {1, 5, 6, 8} and {0, 1, 3, 9} are (13, 4, 1) cyclic difference sets.
For i ∈ G, define the shift gi : G → G by zgi = i + z. It is easy to see that applying gi
to a difference set in G yields a difference set with the same parameters. The shifts of a
(v, k, λ) difference set in G form a symmetric 2-(v, k, λ) design with G in its automorphism
group.
323
8.7 Cyclic projective planes
Theorem 8.7.5 Let G be an abelian group of order v, and let G = {gi | i ∈ G}. The following hold:
(i) Let D be a (v, k, λ) difference set in G. Let Di = {i + δ | δ ∈ D} and B = {Di | i ∈ G}.
Then D = (G, B) is a symmetric 2-(v, k, λ) design with G ⊆ Aut(D).
(ii) If D = (G, B) is a symmetric 2-(v, k, λ) design which has G ⊆ Aut(D), then any block
D of B is a (v, k, λ) difference set in G and the blocks of B are Di = {i + δ | δ ∈ D}.
Proof: Let D be a (v, k, λ) difference set in G with D = (G, B) as described. Let i and j be
distinct elements of G. By definition i − j = x − y for λ pairs (x, y) of distinct elements in
D × D. Then i − x = j − y; as i = i − x + x ∈ Di−x and j = j − y + y = i − x + y ∈
Di−x , every pair of points is in at least λ blocks. By (8.12),
k
v
v
=
λ.
2
2
The left-hand side counts the number of pairs of distinct elements in all the Di s and the
right-hand side counts the number of pairs of distinct elements in G repeated λ times each.
Therefore no pair can appear more than λ times in all the Di s. Thus D is a 2-(v, k, λ) design;
since G has order v, D has v blocks and so is symmetric. As (D j )gi = Di+ j , G ⊆ Aut(D)
proving (i).
For the converse, let D be any block in B. Then Di = {i + δ | δ ∈ D} is the image of D
under gi and hence is also in B. Suppose first that Di = D j for some i = j. Then D = Dm ,
where m = i − j = 0, implying that gm fixes the block D. By Theorem 8.7.2 gm must fix
a point. Thus m + ℓ = ℓ for some ℓ ∈ G, which is a contradiction as m = 0. Therefore the
Di s are distinct and hence must be all v blocks as G has order v.
Let w be a nonzero element of G. Then 0 and w are distinct points and so must be in
λ blocks Dℓ . Thus 0 = ℓ + x and w = ℓ + y for some x and y in D. But y − x = w and
hence every nonzero element of G occurs as a difference of elements of D at least λ times.
There are k(k − 1) differences of distinct elements of D. But,
b=v=λ
v(v − 1)
,
k(k − 1)
by Theorem 8.1.3. Thus k(k − 1) = λ(v − 1) and hence each difference occurs exactly λ
times, proving (ii).
This theorem shows that there is a one-to-one correspondence between difference sets
in G and symmetric designs with points G having a group of automorphisms isomorphic to
G acting naturally on the points and blocks. The design D arising from the difference set D
as described in Theorem 8.7.5 is called the development of D.
We can use quadratic residues to construct difference sets.
Theorem 8.7.6 ([204]) Let q = 4m − 1 = p t for a prime p. The set D = {x 2 | x ∈
Fq , x = 0} is a (4m − 1, 2m − 1, m − 1) difference set in the additive group of Fq .
Proof: As q = 4m − 1 = p t , p ≡ −1 (mod 4) and t is odd. By Lemma 6.2.4, −1 is a not a
square in F p . Hence −1 is a root of x 2 + 1 in F p2 but not in F p . Thus −1 is not a square in Fq
as F p2 ⊆ Fq . Since the nonzero elements of Fq form a cyclic group Fq∗ with generator α, D
324
Designs
consists of the 2m − 1 elements {α 2i | 1 ≤ i ≤ 2m − 1}, and (−1)D makes up the remaining nonzero elements (the nonsquares) of Fq . Let M be the multiset {x − y | x ∈ D, y ∈ D,
x = y}. If x − y is a fixed difference in M, then for 1 ≤ i ≤ 2m − 1, α 2i (x − y) = (α i )2 x −
(α i )2 y and −α 2i (x − y) = (α i )2 y − (α i )2 x are each in M and comprise all 4m − 2 nonzero
elements of Fq . Thus every nonzero element of Fq occurs λ times for some λ, making D a
(4m − 1, 2m − 1, λ) difference set in the additive group of Fq . By (8.12), λ = m − 1.
Applying Theorem 8.7.6 to the case q = p = 4m − 1 where p is a prime, we obtain a
cyclic difference set in Z p .
Exercise 466 Use Theorem 8.7.6 to obtain (v, k, λ) cyclic difference sets in Z7 , Z11 , Z19 ,
Z23 , and Z31 . What are their parameters v, k, and λ?
When λ = 1 and G = Zv , the symmetric designs are cyclic projective planes. Since it
is so difficult to prove the nonexistence of projective planes with parameters not covered
by the Bruck–Ryser–Chowla Theorem, it is reasonable to consider planes with additional
conditions. One natural possibility is to examine criteria under which cyclic projective
planes do not exist. Our next two theorems show that if a cyclic projective plane has order µ
where either µ ≡ 2 (mod 4) or µ ≡ ±3 (mod 9), then µ = 2 or µ = 3. These results were
proved in the more general setting of (µ2 + µ + 1, µ + 1, 1) difference sets in an abelian
group of order µ2 + µ + 1 in [353]. Using duadic codes, we can prove these two theorems
rather easily. The proof of the first is found in [267].
Theorem 8.7.7 Let D be a cyclic projective plane of order µ ≡ 2 (mod 4). Then µ = 2.
Proof: Let A be the incidence matrix of D. By Theorem 8.7.1(ii), D2 (A) is a [v, (v − 1)/2]
self-orthogonal even-like duadic code with splitting µ−1 . By Theorems 8.5.3 and 6.4.2,
C 2 (A) is an odd-like duadic code. By Theorem 8.6.3, C 2 (A) has minimum weight µ + 1, with
minimum weight codewords being those whose supports are the lines. By Theorem 4.3.13,
µ2 ∈ PAut(C 2 (A)) and thus µ2 permutes the minimum weight codewords. Hence µ2 ∈
Aut(D). Because µ2 fixes the coordinate 0, by Theorem 8.7.2, µ2 fixes a line ℓ and thus
µ2 fixes a minimum weight codeword c of C 2 (A). But any nonzero binary codeword fixed
by µ2 must be an idempotent. By Theorem 8.7.5, the lines of D can be obtained by cyclic
shifts of ℓ; as the codewords whose supports are the lines generate C 2 (A), c must be the
generating idempotent. But the generating idempotent of a binary duadic code of length v
has weight (v − 1)/2 or (v + 1)/2 by equations (6.1) and (6.3). Thus µ + 1 = (v ± 1)/2 =
(µ2 + µ + 1 ± 1)/2, which implies µ = 2.
A similar argument involving ternary duadic codes yields the following result.
Theorem 8.7.8 Let D be a cyclic projective plane of order µ where 3 | µ but 9 ∤ µ. Then
µ = 3.
Proof: Let A be the incidence matrix of the 2-(v, k, 1) design D of order µ = k − 1.
By Theorem 8.5.3, as 3 ∤ (µ + 1), D3 (A) has dimension (v − 1)/2 and D3 (A) = C 3 (A)⊥ ⊂
C 3 (A) = D3 (A)⊥ . Hence by Theorem 6.4.1, D3 (A) is an even-like duadic code with splitting
given by µ−1 . By Theorem 6.4.2, C 3 (A) is an odd-like duadic code. By Theorem 4.3.13,
325
8.7 Cyclic projective planes
µ3 ∈ PAut(C 3 (A)) and thus µ3 permutes the minimum weight codewords; these codewords
are precisely those with supports the lines of the plane by Theorem 6.5.2 and are multiples of
the binary vectors corresponding to these supports. Hence µ3 ∈ Aut(D). Because µ3 fixes
the coordinate 0 by Theorem 8.7.2, µ3 fixes a line ℓ and thus µ3 fixes a minimum weight
codeword c of C 3 (A) with supp(c) = ℓ. Writing the codeword c as the polynomial c(x)
in Rv = F3 [x]/(x v − 1), we have c(x) = c(x)µ3 = c(x 3 ) = c(x)3 as Rv has characteristic
3. Thus c(x)4 = c(x)2 implying that e(x) = c(x)2 is an idempotent. Let C ′ be the cyclic
code generated by e(x) and all of its cyclic shifts. As c(x)e(x) = c(x)3 = c(x), C ′ contains
c(x), all of its cyclic shifts, and their scalar multiples. But these are all the minimum
weight codewords of C 3 (A) by Theorem 6.5.2; hence C ′ = C 3 (A) implying that e(x) is the
generating idempotent of C 3 (A).
As e(x) and e(x)µ−1 are the generating idempotents of an odd-like pair of duadic codes,
by Exercise 329 e(x) + e(x)µ−1 = 1 + j(x). Since v ≡ 1 (mod 3) as 3 | µ, j(x) = 1 +
x + x 2 + · · · + x v−1 and so
2
e(x) + e(x −1 ) = 2 + x + x 2 + · · · + x µ
+µ
(8.13)
.
v−1
We examine how each of the terms in e(x) = c(x)2 = m=0 em x m can arise in the expansion
of c(x)2 . The term em x m is the sum of terms of the form x i x j where i and j are elements
′
′
of ℓ. Suppose that x i x j = x i x j with i = i ′ . Then i − i ′ ≡ j ′ − j (mod v), implying that
i = j ′ and j = i ′ since the difference i − i ′ modulo v can be obtained only once because
ℓ is a (v, k, 1) difference set by Theorem
has exactly µ + 1 terms
8.7.5.
Thereforei+e(x)
1
j
i j
j i
terms
2x
coming
from
x 2i coming from the product x i x i and µ +
2
µ + 1x x + x x ,
where i = j. Thus e(x) has exactly µ + 1 terms with coefficient 1 and
terms with
2
coefficient 2. The same holds for e(x)µ−1 . In order for (8.13) to hold, since the coefficient
of x 0 in both e(x) and e(x)µ−1 must agree, the coefficient of x 0 in both e(x) and e(x)µ−1
must be 1. Furthermore, e(x) and e(x)µ−1 must have exactly the same set of terms x k
with coefficients equal to 2; these terms when added in e(x) + e(x)µ−1 have coefficient
equal to 1. Finally, whenever the coefficient of x k , with k = 0, in e(x) (or e(x)µ−1 ) is 1,
e(x))
then the coefficient of x k in e(x)µ−1 (respectively,
is 0. This accounts for all terms of
1
+ 2(µ + 1) − 1 terms, which must
e(x) + e(x)µ−1 , and so there is a total of µ +
2
therefore equal v = µ2 + µ + 1. Solving gives µ2 − 3µ = 0; thus µ = 3.
Another result, found in [266], eliminates certain orders for the existence of cyclic projective planes. To prove it, we need the following two lemmas.
Lemma 8.7.9 Let v be an odd positive integer relatively prime to the integer b. Suppose v
+
has the prime factorization v = i piai , where pi are distinct odd primes. We have be ≡ −1
(mod v) for some odd positive integer e if and only if, for all i, bei ≡ −1 (mod pi ) for some
odd positive integer ei .
Proof: If be ≡ −1 (mod v) for some odd positive integer e, then clearly we may choose
ei = e.
The converse requires more work. By Exercise 467, as bei ≡ −1 (mod pi ), there is an
odd positive integer f i such that b fi ≡ −1 (mod piai ). Therefore the converse is proved if
we show that whenever n 1 and n 2 are relatively prime odd integers such that, for i = 1 and
326
Designs
2, b gi ≡ −1 (mod n i ) where gi is odd, then b g1 g2 ≡ −1 (mod n 1 n 2 ). As gi is odd, b g1 g2 ≡
(−1)g2 ≡ −1 (mod n 1 ) and b g1 g2 ≡ (−1)g1 ≡ −1 (mod n 2 ) implying n i | (b g1 g2 + 1) and
hence n 1 n 2 | (b g1 g2 + 1) because n 1 and n 2 are relatively prime.
Exercise 467 Let p be an odd prime and t a positive integer.
t−1
(a) Prove that if x is an integer, then (−1 + x p) p = −1 + yp t for some integer y.
(Compare this to Exercise 337.)
(b) Prove that if be ≡ −1 (mod p) for some odd integer e, there is an odd integer f such
that b f ≡ −1 (mod p t ).
Lemma 8.7.10 Let p be an odd prime.
(i) If p ≡ 3 (mod 8), then 2e ≡ −1 (mod p) for some odd positive integer e.
(ii) If p ≡ 1 (mod 8) and ord p (2) = 2e for some odd positive integer e, then 2e ≡
−1 (mod p).
(iii) If p ≡ −3 (mod 8), then 4e ≡ −1 (mod p) for some odd positive integer e.
(iv) If p ≡ 1 (mod 8) and ord p (2) = 4e for some odd positive integer e, then 4e ≡ −1
(mod p).
Proof: If p ≡ 3 (mod 8), then by Lemma 6.2.6, ord p (2) = 2e for some odd positive integer
e. So under the conditions of (i) or (ii), 2e is a solution of x 2 − 1 modulo p; as this polynomial
has only two solutions ±1 and 2e ≡ 1 (mod p), we must have 2e ≡ −1 (mod p) yielding (i)
and (ii). Parts (iii) and (iv) follow analogously since if p ≡ −3 (mod 8), then ord p (2) = 4e
for some odd positive integer e by Lemma 6.2.6 implying ord p (4) = 2e.
Let p be an odd prime. We say that p is of type A if either p ≡ 3 (mod 8) or p ≡
1 (mod 8) together with ord p (2) ≡ 2 (mod 4). We say that p is of type B if either p ≡ −3
(mod 8) or p ≡ 1 (mod 8) together with ord p (2) ≡ 4 (mod 8).
+
Theorem 8.7.11 Suppose that µ is even and v = µ2 + µ + 1 = piai , where the pi are
either all of type A or all of type B. Then there is no cyclic projective plane of order µ.
Proof: The proof is by contradiction. Suppose that there is a cyclic projective plane of
order µ. Let A be the incidence matrix of the plane. By Theorem 8.5.2, the code D4 (A)
is self-orthogonal under the ordinary inner product and has dimension at most (v − 1)/2.
The code C 4 (A) is odd-like as the blocks have odd size and, by Theorem 8.5.2(i), D4 (A)
must have codimension 1 in C 4 (A). The code C 4 (A) is cyclic as the plane is cyclic. As the
sum of the entries in a column of A is µ + 1, adding the rows of A in F4 gives the all-one
vector 1. Thus C 4 (A) = D4 (A) + 1. By Corollary 4.4.12,
C 4 (A) ∩ C 4 (A)µ−1 = 1.
(8.14)
First, assume that all the primes pi are of type A. By Lemmas 8.7.9 and 8.7.10, there is an
odd positive integer e such that 2e ≡ −1 (mod v). So 4(e−1)/2 2 ≡ −1 (mod v), implying that
if C is a 4-cyclotomic coset modulo v, then Cµ−1 = Cµ2 as multiplying a 4-cyclotomic
coset by 4 fixes the coset. Therefore C 4 (A)µ−1 = C 4 (A)µ2 as both must have the same
defining set. As C 2 (A) and C 2 (A)µ2 have the same defining set, since µ2 fixes 2-cyclotomic
327
8.7 Cyclic projective planes
cosets, C 2 (A) = C 2 (A)µ2 . Thus we obtain
C 2 (A) = C 2 (A) ∩ C 2 (A)µ2 ⊆ C 4 (A) ∩ C 4 (A)µ2 = 1
by (8.14), an obvious contradiction as C 2 (A) contains binary vectors whose supports are
the lines of the plane.
Now assume that all the primes pi are of type B. As D4 (A) has a generating matrix
consisting of binary vectors, it is self-orthogonal under the Hermitian inner product as well
as the ordinary inner product (see the related Exercise 360). As C 4 (A) = D4 (A) + 1, by
Corollary 4.4.17,
C 4 (A) ∩ C 4 (A)µ−2 = 1.
(8.15)
By Lemmas 8.7.9 and 8.7.10, 4e ≡ −1 (mod v) for some odd positive integer e. So
4e 2 ≡ −2 (mod v) implying that if C is a 4-cyclotomic coset modulo v, then Cµ−2 = Cµ2 .
Hence C 4 (A)µ−2 = C 4 (A)µ2 . Again we obtain
C 2 (A) = C 2 (A) ∩ C 2 (A)µ2 ⊆ C 4 (A) ∩ C 4 (A)µ2 = 1
by (8.15), which is a contradiction.
Exercise 468 Using Corollary 8.6.13 of the Bruck–Ryser–Chowla Theorem and Theorems 8.7.7, 8.7.8, and 8.7.11, prove that there do not exist cyclic projective planes of
non-prime power order µ ≤ 50 except possibly for µ = 28, 35, 36, 44, or 45.
We conclude this section with a brief discussion of multipliers for difference sets. A
multiplier of a difference set D in an abelian group G is an automorphism τ of G such
that Dτ = {i + δ | δ ∈ D} for some i ∈ G. Notice that a multiplier of a difference set is
an automorphism of the development of the difference set. In the case where G = Zv , the
automorphisms of Zv are the maps µa : Zv → Zv given by iµa = ia for all i ∈ Zv ; here a
is relatively prime to v. These are precisely the maps we called multipliers in our study of
cyclic codes in Section 4.3. We can prove a multiplier theorem for cyclic difference sets by
appealing to the theory of cyclic codes.
Theorem 8.7.12 ([287]) Let D be a (v, k, λ) cyclic difference set. Suppose that p is a prime
with p ∤ v and p | µ = k − λ. Assume that p > λ. Then µ p is a multiplier of D.
Proof: Let D be the development of D. Then D is a 2-(v, k, λ) symmetric design by
Theorem 8.7.5. If D has incidence matrix A, then C p (A) is a cyclic code. By Theorem 4.3.13,
µ p is an automorphism of C p (A). Let c D , respectively c Dµ p , be the vector in Fvp whose
nonzero components are all 1 and whose support is D, respectively Dµ p . By construction
c D ∈ C p (A). As µ p is a permutation automorphism of C p (A), c Dµ p = c D µ p ∈ C p (A). By
Theorem 8.5.5, as p > λ, Dµ p is a block of D; by Theorem 8.7.5, Dµ p = {i + δ | δ ∈ D}
for some i, and µ p is therefore a multiplier of D.
In the proof of this theorem, we specifically needed p > λ in order to invoke Theorem 8.5.5. However, for every known difference set and every prime divisor p of k − λ
relatively prime to v, µ p is a multiplier of the difference set even if p ≤ λ. A corollary
of this theorem, presented shortly, will illustrate its significance. To state the corollary, we
328
Designs
need a preliminary definition and lemma. A subset of an abelian group G is normalized
provided its entries sum to 0.
Lemma 8.7.13 Let D be a (v, k, λ) difference set in an abelian group G with gcd(v, k) = 1.
Then the development D of D contains a unique normalized block.
Proof: Suppose that the elements of D sum to d. Let Di = {i + δ | δ ∈ D} be an arbitrary
block of D. Then the entries in Di sum to d + ki. As gcd(v, k) = 1, there is a unique
solution i ∈ G of d + ki = 0.
Corollary 8.7.14 ([204]) Let D = (Zv , B) be a cyclic symmetric 2-(v, k, λ) design of order
µ = k − λ with gcd(v, k) = 1. Then the normalized block in B is a union of p-cyclotomic
cosets modulo v for any prime p | µ with p > λ and p ∤ v.
Proof: If D is any block in B, it is a cyclic difference set in Zv and D is the development of D
by Theorem 8.7.5. By Lemma 8.7.13, we may assume D is normalized. By Theorem 8.7.12,
Dµ p is a block in B for any prime p | µ with p > λ and p ∤ v. Clearly, Dµ p is normalized;
by uniqueness in Lemma 8.7.13, Dµ p = D. Thus if i is in D, then i p modulo v is also in
D implying that D is the union of p-cyclotomic cosets modulo v.
Example 8.7.15 Let D = (Z13 , B) be a cyclic projective plane of order µ = 3. Then D is a
symmetric 2-(13, 4, 1) design, and its normalized block D is a union of 3-cyclotomic cosets
modulo 13. These cosets are C0 = {0}, C1 = {1, 3, 9}, C2 = {2, 5, 6}, C4 = {4, 10, 12},
and C7 = {7, 8, 11}. As blocks have size 4, D must be equal to one of C0 ∪ Ci for some
i ∈ {1, 2, 4, 7}. Note that each of C0 ∪ Ci is indeed a cyclic difference set; this same fact
is explored in Exercise 359 via duadic codes. By Exercise 469, Dµa for a relatively prime
to v is also a (13, 4, 1) difference set and Dµa is the development of Dµa . Hence as
C1 µ2 = C2 , C1 µ4 = C4 , and C1 µ7 = C7 , the four possibilities for D all produce equivalent cyclic designs. Thus up to equivalence there is only one cyclic projective plane of
order 3.
Exercise 469 Let a be relatively prime to v. Let D be a (v, k, λ) cyclic difference set
in Zv .
(a) Prove that Dµa is also a cyclic difference set in Zv .
(b) Prove that if D is the development of D, then Dµa is the development of Dµa .
Exercise 470 Prove that up to equivalence there is a unique cyclic projective plane of
order 4. Also give a (21, 5, 1) difference set for this plane.
Exercise 471 Prove that up to equivalence there is a unique cyclic symmetric 2-(11, 5, 2)
design. Give an (11, 5, 2) difference set for this design.
Exercise 472 Prove that up to equivalence there is a unique cyclic projective plane of
order 5. Also give a (31, 6, 1) difference set for this plane.
Example 8.7.16 Using Corollary 8.7.14 we can show that there is no (µ2 + µ + 1, µ +
1, 1) cyclic difference set and hence no cyclic projective plane of order µ with 10 | µ.
Suppose D is such a difference set. By Corollary 8.7.14, we may assume that D is normalized
with multipliers µ2 and µ5 , since λ = 1 and neither 2 nor 5 divide µ2 + µ + 1. Let x ∈ D
329
8.8 The nonexistence of a projective plane of order 10
with x = 0. As µ2 and µ5 are multipliers of D, 2x, 4x, and 5x are elements of D. If
4x = x, then the difference x = 2x − x = 5x − 4x occurs twice as a difference of distinct
elements of D, contradicting λ = 1. There is an x ∈ D with 4x = x as follows. If 4x = x,
then 3x = 0. This has at most three solutions modulo µ2 + µ + 1; since D has µ + 1 ≥ 11
elements, at least eight of them satisfy 4x = x.
Exercise 473 Show that there does not exist a cyclic projective plane of order µ whenever µ is a divisible by any of 6, 14, 15, 21, 22, 26, 33, 34, 35, 38, 46, 51, 57, 62, 87,
and 91.
Exercise 474 Show that there does not exist a cyclic projective plane of order µ where µ
is divisible by 55. Hint: You may encounter the equation 120x = 0 in Zµ2 +µ+1 . If so, show
that it has at most 24 solutions.
Exercise 475 As stated at the beginning of this section, if µ is a power of a prime, then
PG(2, µ) is a cyclic projective plane of order µ. In this problem we eliminate nonprime
power orders µ up to 100.
(a) What nonprime power orders µ with 2 ≤ µ ≤ 100 are not eliminated as orders of cyclic
projective planes based on Example 8.7.16 and Exercises 473 and 474?
(b) Which orders from part (a) cannot be eliminated by Theorems 8.7.7 and 8.7.8?
(c) Of those nonprime power orders left in (b), which cannot be eliminated by either the
Bruck–Ryser–Chowla Theorem or Theorem 8.7.11?
With the techniques of this chapter and others, it has been shown that there is no cyclic
projective plane of order µ ≤ 3600 unless µ is a prime power.
Exercise 476 Do the following:
(a) Find all normalized (19, 9, 4) difference sets.
(b) Find all normalized (37, 9, 2) difference sets.
(c) Show that there does not exist a cyclic symmetric 2-(31, 10, 3) design (even though
(8.12) holds).
8.8
The nonexistence of a projective plane of order 10
We now discuss the search for the projective plane of order 10 and give an indication of how
coding theory played a major role in showing the nonexistence of this plane. (See [181] for
a survey of this problem.)
Suppose that D is a projective plane of order 10 with incidence matrix A. By Theorem 8.6.4, C 2 (A) is a [111, 56, 11] code where all codewords have weights congruent to either 0 or 3 modulo 4. Furthermore, all the codewords in C 2 (A) of weight 11
have supports that are precisely the lines of the plane by Theorem 8.6.3. In addition,
C 2 (A) contains the all-one codeword, and
C 2 (A) is self-dual and doubly-even. From this
we deduce that if Ai = Ai (C 2 (A)), then A0 = A111 = 1 and A11 = A100 = 111. Also,
Ai = 0 if 1 ≤ i ≤ 10 and 101 ≤ i ≤ 110. Furthermore, Ai = 0 unless i ≡ 0 (mod 4) or i ≡
3 (mod 4). If three further Ai are known, then by Theorem 7.3.1, the weight distribution of
C 2 (A) would be uniquely determined. A succession of authors [220, 48, 184, 185] showed
330
Designs
that A15 = A12 = A16 = 0. In each case when a vector of weight 15, 12, or 16 was assumed
to be in C 2 (A), a contradiction was reached before a projective plane could be constructed
(by exhaustive computer search). Once these values for Ai were determined to be 0, the
unique weight distribution of C 2 (A) was computed and all the entries, many of astronomical
size, turned out to be nonnegative integers. It also turned out that A19 was nonzero. Then
C. W. H. Lam tried to construct D where C 2 (A) contained a codeword of weight 19 or
show that A19 = 0. When he got the latter result, he concluded that there is no plane of
order 10 [186].
We give a very short description of what was involved in showing that C 2 (A) could not
contain any vectors of weights 15, 12, or 16. In the first paper on this topic, MacWilliams,
C 2 (A)
Sloane, and Thompson [220] assume that C 2 (A) contains a vector x of weight 15. As
is doubly-even, each line must meet supp(x) in an odd number of points s. The minimum
weight of C 2 (A) excludes values of s higher than 7. The value 7 can be eliminated since a
vector y ∈ C 2 (A) of weight 11 that meets supp(x) in seven points when added to x gives a
vector of weight 12 (hence the support of an oval by Corollary 8.6.7) that meets a line in
four points, something that cannot happen by Theorem 8.6.5. Let bi denote the number of
lines that meet supp(x) in i points for i = 1, 3, and 5. Then we have:
b1 + b3 + b5 = 111,
b1 + 3b3 + 5b5 = 165,
3b3 + 10b5 = 105.
The first equation counts the 111 lines in C 2 (A). Each of the 15 points of supp(x) lies on 11
lines, and the second equation counts the incidences of points of supp(x) on lines. Each of
the 105 pairs of points on supp(x) determines a line and the last equation counts these. The
solution to this system is b1 = 90, b3 = 15, and b5 = 6. It is a short step [220] to show that
the six lines that meet supp(x) in five points are uniquely determined (where “uniquely”
means up to permutation of coordinates). A bit more analysis after this gives (uniquely) the
15 lines that meet supp(x) in three points. It requires much more analysis and the use of a
computer to show that the remaining 90 lines cannot be constructed.
The analysis of how the lines must meet a vector of a presumed weight, the number of
cases to be considered, and the amount of computer time needed increases as one assumes
that C 2 (A) contains a codeword of weight 12 [48, 184], weight 16 [185], and, the final case,
weight 19 [186]. The last calculation used the equivalent of 800 days on a VAX-11/780 and
3000 hours on a Cray-1S. This unusual amount of computing prompted the authors [186]
to analyze where possible errors could occur and the checks used. Solution of the following
problem would be very interesting.
Research Problem 8.8.1 Find a proof of the nonexistence of the projective plane of order
10 without a computer or with an easily reproducible computer program.
8.9
Hadamard matrices and designs
There are many combinatorial configurations that are related to designs. One of these,
Hadamard matrices, appears in codes in various ways. An n × n matrix H all of whose
331
8.9 Hadamard matrices and designs
entries are ±1 which satisfies
H H T = n In ,
(8.16)
where In is the n × n identity matrix, is called a Hadamard matrix of order n.
Exercise 477 Show that two different rows of a Hadamard matrix of order n > 1 are
orthogonal to each other; this orthogonality can be considered over any field including the
real numbers.
Exercise 478 Show that a Hadamard matrix H of order n satisfies
H T H = n In .
Note that this shows that the transpose of a Hadamard matrix is also a Hadamard
matrix.
Exercise 479 Show that two different columns of a Hadamard matrix of order n > 1 are
orthogonal to each other; again this orthogonality can be considered over any field including
the real numbers.
Example 8.9.1 The matrix H1 = [1] is a Hadamard matrix of order 1. The matrix
H2 =
1 1
1 −1
is a Hadamard matrix of order 2.
Two Hadamard matrices are equivalent provided one can be obtained from the other
by a combination of row permutations, column permutations, multiplication of some rows
by −1, and multiplication of some columns by −1.
Exercise 480 Show that if H and H ′ are Hadamard matrices of order n, then they are
equivalent if and only if there are n × n permutation matrices P and Q and n × n diagonal
matrices D1 and D2 with diagonal entries ±1 such that
H ′ = D1 P H Q D2 .
Exercise 481 Show that, up to equivalence, H1 and H2 as given in Example 8.9.1 are the
unique Hadamard matrices of orders 1 and 2, respectively.
Note that any Hadamard matrix is equivalent to at least one Hadamard matrix whose
first row and first column consists of 1s; such a Hadamard matrix is called normalized. As
we see next, the order n of a Hadamard matrix is divisible by four except when n = 1 or
n = 2.
Theorem 8.9.2 If H is a Hadamard matrix of order n, then either n = 1, n = 2, or n ≡
0 (mod 4).
Proof: Assume that n ≥ 3 and choose H to be normalized. By (8.16), each of the other
rows of H must have an equal number of 1s and −1s. In particular, n must be even. By
332
Designs
permuting columns of H , we may assume that the first three rows of H are
11 · · · 1 1 1 · · · 1 1 1 · · · 1 1 1 · · · 1
11 · · · 1 1 1 · · · 1 −1−1 · · · −1 −1−1 · · · −1
11
· · · 1A >−1−1?@· · · −1A > 1 1?@· · · 1A −1−1
> ?@
>
?@· · · −1A
a
b
c
d
Hence
a + b = c + d = n/2, a + c = b + d = n/2, and a + d = b + c = n/2.
Solving gives a = b = c = d = n/4, completing the proof.
Even though Hadamard matrices are known for many values of n ≡ 0 (mod 4), including
some infinite families, the existence of Hadamard matrices for any such n is still an open
question. Hadamard matrices have been classified for modest orders. They are unique, up to
equivalence, for orders 4, 8, and 12; up to equivalence, there are five Hadamard matrices of
order 16, three of order 20, 60 of order 24, and 487 of order 28; see [155, 169, 170, 171, 330].
Exercise 482 Show that, up to equivalence, there is a unique Hadamard matrix of
order 4.
One way to construct Hadamard matrices is by using the tensor product. Let A = [ai, j ]
be an n × n matrix and B an m × m matrix. Then the tensor product of A and B is the
nm × nm matrix
a1,1 B a1,2 B · · · a1,n B
a2,1 B a2,2 B · · · a2,n B
A⊗B =
.
..
.
an,1 B an,2 B · · · an,n B
Example 8.9.3 Let H4 = H2 ⊗ H2 , where H2 is given in Example 8.9.1. Then
1
1
1
1
1 −1
1 −1
H4 =
1
1 −1 −1
1 −1 −1
1
is easily seen to be a Hadamard matrix. Notice that by Exercise 482, this is the unique
Hadamard matrix of order 4.
This example illustrates the following general result.
Theorem 8.9.4 If L and M are Hadamard matrices of orders ℓ and m, respectively, then
L ⊗ M is a Hadamard matrix of order ℓm.
Exercise 483 Prove Theorem 8.9.4.
Let H2 be the Hadamard matrix of Example 8.9.1. For m ≥ 2, define H2m inductively by
H2m = H2 ⊗ H2m−1 .
333
8.9 Hadamard matrices and designs
By Theorem 8.9.4, H2m is a Hadamard matrix of order 2m . In particular, we have the
following corollary.
Corollary 8.9.5 There exist Hadamard matrices of all orders n = 2m for m ≥ 1.
The Hadamard matrices H2m can be used to generate the first order Reed–Muller codes.
In H2 , replace each entry equal to 1 with 0 and each entry equal to −1 with 1. Then we get
the matrix
0
0
0
.
1
(8.17)
The rows of this matrix can be considered as binary vectors. The complement of a binary
vector c is the binary vector c + 1, where 1 is the all-one vector; so the complement of
c is the vector obtained from c by replacing 1 by 0 and 0 by 1. The two rows of matrix
(8.17) together with their complements are all the vectors in F22 . Apply the same process to
H4 from Example 8.9.3. Then the four rows obtained plus their complements give all the
even weight vectors in F42 . Exercise 484 shows what happens when this process is applied
to H8 .
Exercise 484 Show that the binary vectors of length 8 obtained from H8 by replacing 1 with
0 and −1 with 1 together with their complements are the codewords of the Reed–Muller
3.
code R(1, 3). Note that R(1, 3) is also the [8, 4, 4] Hamming code H
Theorem 8.9.6 The binary vectors of length 2m obtained from H2m by replacing 1 with 0
and −1 with 1 together with their complements are the codewords of R(1, m).
Proof: In Section 1.10, R(1, m) is defined using the (u | u + v) construction involving
R(1, m − 1) and R(0, m − 1). As R(0, m − 1) consists of the zero codeword and all-one
codeword of length 2m−1 , we have, from (1.7), that
R(1, m) = {(u, u) | u ∈ R(1, m − 1)} ∪ {(u, u + 1) | u ∈ R(1, m − 1)}.
(8.18)
We prove the result by induction on m, the cases m = 1 and 2 being described previously,
and the case m = 3 verified in Exercise 484. By definition, H2m = H2 ⊗ H2m−1 , and so the
binary vectors obtained by replacing 1 with 0 and −1 with 1 in H2m together with their
complements are precisely the vectors (u, u) and (u, u + 1) where u is a vector obtained
by replacing 1 with 0 and −1 with 1 in H2m−1 together with their complements. By (8.18)
and our inductive hypothesis that the result is true for R(1, m − 1), we see that the result
is true for R(1, m).
It is usually not true that changing the rows of an order n Hadamard matrix to binary
vectors and their complements, as just described, produces a binary linear code. However,
it does obviously produce a possibly nonlinear binary code of 2n vectors of length n such
that the distance between distinct codewords is n/2 or more. Using this, Levenshtein [193]
has shown that if Hadamard matrices of certain orders exist, then the binary codes obtained
from them meet the Plotkin Bound.
334
Designs
Exercise 485 Suppose that H is a Hadamard matrix of order n = 4s.
(a) By examining the proof of Theorem 8.9.2, we see that three rows, including the all-one
row, have common 1s in exactly s columns. Prove that if we take any three rows of H ,
then there are exactly s columns where all the entries in these rows are the same (some
columns have three 1s and some columns have three −1s). Hint: Rescale the columns
so that one of the rows is the all-one row.
(b) Prove that if we take any three columns of H , then there are exactly s rows where all
the entries in these columns are the same (some rows have three 1s and some rows have
three −1s).
(c) Prove that if H is normalized and if we take any three columns of H , then there are
exactly s − 1 rows excluding the first row where all the entries in these columns are the
same (some rows have three 1s and some rows have three −1s).
Now let H be a normalized Hadamard matrix of order n = 4s. Delete the first row. Each
of the remaining 4s − 1 rows has 2s 1s and 2s −1s. The 4s coordinates are considered to
be points. To each of the remaining rows associate two blocks of size 2s: one block is the
set of points on which the row is 1, and the other block is the set of points on which the row
is −1. Denote this set of points and blocks D(H ). There are 8s − 2 blocks and they turn
out to form a 3-(4s, 2s, s − 1) design, called a Hadamard 3-design, as proved in Theorem
8.9.7.
Exercise 486 Show that in the point-block structure D(H2m ), the blocks are precisely the
supports of the minimum weight codewords of R(1, m). Then using Exercise 456, show
that D(H2m ) is indeed a 3-design. Note that this exercise shows that the Hadamard structure
obtained from H2m is a 3-design without appealing to Theorem 8.9.7.
Theorem 8.9.7 If H is a normalized Hadamard matrix of order n = 4s, then D(H ) is a
3-(4s, 2s, s − 1) design.
Proof: The proof follows from Exercise 485(c).
Exercise 487 Find the Pascal triangle for a 3-(4s, 2s, s − 1) design.
Exercise 488 Prove that the derived design, with respect to any point, of any 3-(4s, 2s,
s − 1) design is a symmetric 2-(4s − 1, 2s − 1, s − 1) design.
Exercise 489 Prove that up to equivalence the Hadamard matrix of order 8 is unique. Hint:
Use Example 8.2.4.
The converse of Theorem 8.9.7 is also true.
Theorem 8.9.8 Let D = (P, B) be a 3-(4s, 2s, s − 1) design. Then there is a normalized
Hadamard matrix H such that D = D(H ).
Proof: Suppose that B and B ′ are distinct blocks of B that intersect in a point x. By
Exercise 488 the derived design D x of D with respect to x is a symmetric 2-(4s − 1,
2s − 1, s − 1) design. By Lemma 8.5.1, any two blocks of D x intersect in s − 1 points.
335
8.9 Hadamard matrices and designs
This implies that any two blocks of B are either disjoint or intersect in s points. As D x has
4s − 1 blocks, every point is in 4s − 1 blocks of D.
Fix a block B in B and define
X = {(x, B ′ ) | x ∈ B, x ∈ B ′ ∈ B, and B = B ′ }.
We count the size of X in two different ways. There are 2s points x in B. For each such
x there are 4s − 1 blocks, including B, containing x. Therefore |X | = 2s(4s − 2). Let
N be the number of blocks B ′ = B with B ′ ∩ B nonempty. Then as B ′ ∩ B contains s
points, |X | = s N . Equating the two counts gives N = 8s − 4. Thus there are 8s − 3 blocks,
including B, which have a point in common with B. Therefore there is exactly one block
in B disjoint from B; as each block contains half the points, this disjoint block must be the
complement. So the blocks of B come in complementary pairs.
Fix a point x and consider the set S of 4s − 1 blocks containing x. The remaining blocks
are the complements of the blocks in S. Order the points with x listed first; these points
label the columns of a 4s × 4s matrix H , determined as follows. Place the all-one vector
in the first row; subsequent rows are associated with the blocks of S in some order and
each row is obtained by placing a 1 in a column if the point labeling the column is in the
block and a −1 in a column if the point is not in the block. Clearly, D = D(H ) if we
show H is a normalized Hadamard matrix as the −1 entries in a row determine one of the
blocks complementary to a block of S. As all entries in the first row and column of H
are 1, we only need to show H is a Hadamard matrix. As every entry in each row is ±1,
the inner product of a row with itself is 4s. As every block has 2s points (and hence the
corresponding row has 2s 1s and 2s −1s) and distinct blocks of S meet in s points, the inner
product of distinct rows must be 0. Thus H H T = 4s I4s , implying that H is a Hadamard
matrix.
Another construction of an infinite family of Hadamard matrices is given by quadratic
residues in a finite field Fq , where q ≡ ±1 (mod 4). Define χ : Fq → {0, − 1, 1} by
0 if a = 0,
χ(a) =
1 if a = b2 for some b ∈ Fq with b = 0,
−1 otherwise.
Note that χ (ab) = χ(a)χ
(b). Notice also that if q = p, where p is a prime, and if
a = 0, then χ (a) = ap , the Legendre symbol defined in Section 6.6. Let a0 = 0, a1 = 1,
a2 , . . . , aq−1 be an ordering of the elements of Fq . If q is a prime, we let ai = i. Let Sq
be the (q + 1) × (q + 1) matrix [si, j ], with indices ∞, a0 , . . . , aq−1 , where si, j is defined
as
si, j
0 if i = ∞
1 if i = ∞
=
χ (−1) if i =
∞
∞
χ(ai − a j ) if i =
j
j
j
j
= ∞,
∞,
=
= ∞,
= ∞.
(8.19)
336
Designs
Example 8.9.9 When q = 5, the matrix S5 is
∞
0
1
2
3
4
1
2
3
4
∞ 0
0
1
1
1
1
1
1
0
1
−1
−1
1
1
0
1 −1 −1
1
.
1 −1
1
0
1 −1
1 −1 −1
1
0
1
1
1 −1 −1
1
0
When q is a prime, Sq is a bordered circulant matrix. To analyze Sq further, we need the
following result that extends Lemma 6.2.4.
Lemma 8.9.10 If q ≡ ±1 (mod 4), then χ (−1) = 1 if q ≡ 1 (mod 4), and χ (−1) = −1 if
q ≡ −1 (mod 4).
Proof: Let p be the characteristic of Fq and let q = p t . If p ≡ 1 (mod 4), then −1 is a
square in F p ⊆ Fq by Lemma 6.2.4 and q ≡ 1 (mod 4). Suppose that p ≡ −1 (mod 4). If
t is even, then x 2 + 1 has a root in F p2 ⊆ Fq and again q ≡ 1 (mod 4). If t is odd, then
x 2 + 1 has a root in F p2 but not F p by Lemma 6.2.4. As F p2 is not contained in Fq , −1 is
not a square in Fq and in this case q ≡ −1 (mod 4).
Theorem 8.9.11 The following hold where q is an odd prime power:
(i) If q ≡ 1 (mod 4), then Sq = SqT .
(ii) If q ≡ −1 (mod 4), then Sq = −SqT .
(iii) Sq SqT = q Iq+1 .
Proof: To prove (i) and (ii), it suffices to show that si, j = χ(−1)s j,i by Lemma 8.9.10.
This is clear if one or both of i or j is ∞ as χ (−1) = 1 if q ≡ 1 (mod 4), and χ (−1) = −1
if q ≡ −1 (mod 4) by Lemma 8.9.10. As si, j = χ (ai − a j ) = χ (−1)χ (a j − ai ) =
χ(−1)s j,i , (i) and (ii) are true.
To show (iii), we must compute the inner product (over the integers) of two rows of Sq ,
and show that if the rows are the same, the inner product is q, and if the rows are distinct, the
inner product is 0. The former is clear as every row has q entries that are ±1 and one entry
0. Since Fq has odd characteristic, the nonzero elements of Fq , denoted Fq∗ , form a cyclic
group of even order, implying that half these elements are squares and half are nonsquares
(a generalization of Lemma 6.6.1). In particular, this implies that
a∈Fq
χ (a) = 0,
(8.20)
which shows that the inner product of the row labeled ∞ with any other row is 0. The inner
product of row i and row j with i = j and neither equaling ∞ is
q−1
1+
k=0
χ (ai − ak )χ (a j − ak ).
337
8.9 Hadamard matrices and designs
However, this is
1+
b∈Fq
χ (b)χ(c + b) = 1 +
∗
b∈Fq
χ(b)χ(c + b),
where b = ai − ak runs through Fq as ak does and c = a j − ai ∈ Fq∗ . Since χ(c + b) =
χ(b)χ(cb−1 + 1) if b = 0, the inner product becomes
1+
∗
q
b∈F
χ (b)2 χ (cb−1 + 1) = 1 +
∗
q
b∈F
χ(cb−1 + 1).
As cb−1 + 1 runs through all elements of Fq except 1 as b runs through all elements of Fq∗ ,
by (8.20) the inner product is 0.
Corollary 8.9.12 We have the following Hadamard matrices:
(i) If q ≡ −1 (mod 4), then Iq+1 + Sq is a Hadamard matrix of order q + 1.
(ii) If q ≡ 1 (mod 4), then
Iq+1 + Sq
−Iq+1 + Sq
Iq+1 − Sq
Iq+1 + Sq
is a Hadamard matrix of order 2q + 2.
(iii) If q ≡ −1 (mod 4), then
Iq+1 + Sq
Iq+1 + Sq
Iq+1 + Sq
−Iq+1 − Sq
and
Iq+1 + Sq
Iq+1 − Sq
Iq+1 + Sq
−Iq+1 + Sq
are Hadamard matrices of order 2q + 2.
Exercise 490 Prove Corollary 8.9.12.
Exercise 491 Show that Hadamard matrices exist of all orders n ≤ 100 where n ≡ 0
(mod 4) except possibly n = 92. (Note that a Hadamard matrix of order 92 exists, but we
have not covered its construction.)
9
Self-dual codes
In this chapter, we study the family of self-dual codes. Of primary interest will be upper bounds on the minimum weight of these codes, their possible weight enumerators,
and their enumeration and classification for small to moderate lengths. We restrict our
examination of self-dual codes to those over the three smallest fields F2 , F3 , and F4 . In
the case of codes over F4 , the codes will be self-dual under the Hermitian inner product. We emphasize self-dual codes over these three fields not only because they are the
smallest fields but also because self-dual codes over these fields are divisible, implying that there are regular gaps in their weight distributions. This is unlike the situation
over larger fields, a result known as the Gleason–Pierce–Ward Theorem with which we
begin.
A self-dual code has the obvious property that its weight distribution is the same as that of
its dual. Some of the results proved for self-dual codes require only this equality of weight
distributions. This has led to a broader class of codes known as formally self-dual codes that
include self-dual codes. Recall from Section 8.4 that a code C is formally self-dual provided
C and C ⊥ have the same weight distribution. This implies of course that C is an [n, n/2]
code and hence that n is even. Although this chapter deals mainly with self-dual codes, we
will point out situations where the results of this chapter apply to formally self-dual codes
and devote a section of the chapter to these codes.
9.1
The Gleason–Pierce–Ward Theorem
The Gleason–Pierce–Ward Theorem provides the main motivation for studying self-dual
codes over F2 , F3 , and F4 . These codes have the property that they are divisible. Recall that a code C is divisible by provided all codewords have weights divisible
by an integer , called a divisor of C; the code is called divisible if it has a divisor
> 1.
When C is an [n, n/2] code, Theorems 1.4.5, 1.4.8, and 1.4.10 imply:
r If C is a self-dual code over F , then C is divisible by = 2. If C is a divisible code over
2
F2 with divisor = 4, then C is self-dual.
r C is a self-dual code over F if and only if C is divisible by = 3.
3
r C is a Hermitian self-dual code over F if and only if C is divisible by = 2.
4
The Gleason–Pierce–Ward Theorem states that divisible [n, n/2] codes exist only for the
values of q and given above, except in one trivial situation, and that the codes are always
self-dual except possibly when q = = 2.
339
9.1 The Gleason–Pierce–Ward Theorem
Theorem 9.1.1 (Gleason–Pierce–Ward) Let C be an [n, n/2] divisible code over Fq with
divisor > 1. Then one (or more) of the following holds:
(i) q = 2 and = 2, or
(ii) q = 2, = 4, and C is self-dual, or
(iii) q = 3, = 3, and C is self-dual, or
(iv) q = 4, = 2, and C is Hermitian self-dual, or
(v) = 2 and C is equivalent to the code over Fq with generator matrix [In/2 In/2 ].
The Gleason–Pierce–Ward Theorem generalizes the Gleason–Pierce Theorem, a proof
of which can be found in [8, 314]. In the original Gleason–Pierce Theorem, there is a
stronger hypothesis; namely, C is required to be formally self-dual. In the original theorem,
conclusions (i)–(iv) are the same, but conclusion (v) states that = 2 and C is a code
with weight enumerator (y 2 + (q − 1)x 2 )n/2 . The generalization was proved by Ward in
[345, 347]. We present the proof of the Gleason–Pierce–Ward Theorem in Section 9.11.
Notice that the codes arising in cases (ii) through (v) are self-dual (Hermitian if q = 4);
however, the codes in (i) are merely even and not necessarily formally self-dual.
The codes that arise in the conclusion of this theorem have been given specific names.
Doubly-even self-dual binary codes are called Type II codes; these are the codes arising in
part (ii) of the Gleason–Pierce–Ward Theorem. The self-dual ternary codes from part (iii)
are called Type III codes; the Hermitian self-dual codes over F4 are called Type IV. The
codes in part (ii) also satisfy the conditions in part (i). The binary self-dual codes that are
not doubly-even (or Type II) are called Type I. In other words, Type I codes are singly-even
self-dual binary codes.1
Example 9.1.2 Part (i) of the Gleason–Pierce–Ward Theorem suggests that there may be
binary codes of dimension half their length that are divisible by = 2 but are not self-dual.
This is indeed the case; for instance, the [6, 3, 2] binary code with generator matrix
1 0 0 1 1 1
G = 0 1 0 1 1 1
0 0 1 1 1 1
is a formally self-dual code divisible by = 2 that is not self-dual. As additional examples,
by Theorem 6.6.14, the binary odd-like quadratic residue codes when extended are all
formally self-dual codes divisible by = 2; those of length p + 1, where p ≡ 1 (mod 8),
are not self-dual.
Exercise 492 Let C be the binary code with generator matrix
1 1 0 0 0 0
1 0 1 0 0 0 .
0 0 0 1 1 0
Show that C is divisible by = 2 and is not formally self-dual.
1
Unfortunately in the literature there is confusion over the term Type I; some authors will simply call self-dual
binary codes Type I codes allowing the Type II codes to be a subset of Type I codes. To add to the confusion
further, Type I and Type II have been used to refer to certain other nonbinary codes.
340
Self-dual codes
We conclude this section with a theorem specifying exactly the condition on the length
n which guarantees the existence of a self-dual code of that length; its proof is found in
[260]. In Corollary 9.2.2, we give necessary and sufficient conditions on the length for the
existence of self-dual codes over F2 and F3 , which agree with the following theorem; the
corollary will also give necessary and sufficient conditions on the length for existence of
Type II codes that are not covered in this theorem.
Theorem 9.1.3 There exists a self-dual code over Fq of even length n if and only if (−1)n/2
is a square in Fq . Furthermore, if n is even and (−1)n/2 is not a square in Fq , then the
dimension of a maximal self-orthogonal code of length n is (n/2) − 1. If n is odd, then the
dimension of a maximal self-orthogonal code of length n is (n − 1)/2.
Example 9.1.4 Theorem 9.1.3 shows that a self-dual code of length n over F3 exists if and
only if 4 | n. An example of such a code is the direct sum of m = n/4 copies of the [4, 2, 3]
self-dual tetracode introduced in Example 1.3.3. If n ≡ 2 (mod 4), then no self-dual code of
length n over F3 exists; however, a maximal self-orthogonal code of dimension (n/2) − 1
does exist. An example is the direct sum of m = (n − 2)/4 copies of the tetracode and one
copy of the zero code of length 2. The theorem also guarantees that a self-dual code of length
n over F9 exists if and only if n is even. An example is the direct sum of m = n/2 copies of
the [2, 1, 2] self-dual code with generator matrix [1 ρ 2 ], where ρ is the primitive element
of F9 given in Table 6.4. Notice that the codes we have constructed show one direction of
Theorem 9.1.3 when the fields are F3 and F9 .
Exercise 493 Use the codes constructed in Example 9.1.4 to show that if 4 | n, there exist
self-dual codes of length n over F3t , and if n is even, there exist self-dual codes of length n
over F3t when t is even.
Exercise 494 Prove Theorem 9.1.3 for q a power of 2.
9.2
Gleason polynomials
Self-dual codes over F2 , F3 , and F4 , the last under the Hermitian inner product, all have
weight enumerators that can be expressed as combinations of special polynomials that are
the weight enumerators of specific codes of small length. These polynomials, known as
Gleason polynomials, provide a powerful tool in the study of all self-dual codes over F2 ,
F3 , and F4 .
Before stating Gleason’s Theorem, we discuss the theorem in the case of formally selfdual binary codes. Let
g(x, y) = y 2 + x 2
and
h(x, y) = y 8 + 14x 4 y 4 + x 8 .
Notice that g(x, y) is the weight enumerator of the binary repetition code of length 2,
3.
and h(x, y) is the weight enumerator of the [8, 4, 4] extended binary Hamming code H
Gleason’s Theorem states that if C is any binary formally self-dual code that is divisible by
= 2, then WC (x, y) = P(g, h), where P is a polynomial with rational coefficients in the
341
9.2 Gleason polynomials
variables g and h. Because WC (x, y) is homogeneous of degree n,
P(g, h) =
⌊ n8 ⌋
n
ai g(x, y) 2 −4i h(x, y)i ,
i=0
⌊ n8 ⌋
ai = 1. For example, if n = 6, then
where ai is rational. Also because A0 (C) = 1, i=0
WC (x, y) = g(x, y)3 = (y 2 + x 2 )3 ; the code C that is the direct sum of three copies of the
[2, 1, 2] repetition code has this weight enumerator, by Theorem 7.2.2.
Theorem 9.2.1 (Gleason) Let C be an [n, n/2] code over Fq , where q = 2, 3, or 4. Let
g1 (x, y) = y 2 + x 2 ,
g2 (x, y) = y 8 + 14x 4 y 4 + x 8 ,
g3 (x, y) = y 24 + 759x 8 y 16 + 2576x 12 y 12 + 759x 16 y 8 + x 24 ,
g4 (x, y) = y 4 + 8x 3 y,
g5 (x, y) = y 12 + 264x 6 y 6 + 440x 9 y 3 + 24x 12 ,
g6 (x, y) = y 2 + 3x 2 , and
g7 (x, y) = y 6 + 45x 4 y 2 + 18x 6 .
Then:
(i) if q = 2 and C is formally self-dual and even,
WC (x, y) =
⌊ n8 ⌋
n
ai g1 (x, y) 2 −4i g2 (x, y)i ,
i=0
(ii) if q = 2 and C is self-dual and doubly-even,
WC (x, y) =
n
⌋
⌊ 24
n
ai g2 (x, y) 8 −3i g3 (x, y)i ,
i=0
(iii) if q = 3 and C is self-dual,
WC (x, y) =
n
⌊ 12
⌋
n
ai g4 (x, y) 4 −3i g5 (x, y)i , and
i=0
(iv) if q = 4 and C is Hermitian self-dual,
WC (x, y) =
⌊ n6 ⌋
n
ai g6 (x, y) 2 −3i g7 (x, y)i .
i=0
In all cases, all ai s are rational and
i
ai = 1.
Proof: We will prove part (i) of Gleason’s Theorem and leave parts (ii), (iii), and (iv) as
exercises. Assume that C is a formally self-dual binary code of length n with all weights
even. If Ai = Ai (C) and Ai⊥ = Ai (C ⊥ ) are the weight distributions of C and C ⊥ , respectively,
then Ai = Ai⊥ for 0 ≤ i ≤ n. As C is an even code, C ⊥ must contain the all-one vector
1 implying that 1 = A⊥
n = An and hence that 1 ∈ C. Therefore Ai = 0 if i is odd and
An−i = Ai for all i. For an integer j we choose shortly, let S = {2 j, 2 j + 2, . . . , n − 2 j};
then |S| = s = (n − 4 j)/2 + 1. If A1 , A2 , . . . , A2 j−1 are known, then the only unknowns
342
Self-dual codes
are Ai for i ∈ S. By Theorem 7.3.1, if 2 j − 1 ≥ s − 1 (that is j ≥ (n + 2)/8), then the
Ai s with i ∈ S are uniquely determined by the Pless power moments. Set j = ⌈(n + 2)/8⌉.
Since Ai = 0 if i is odd, once the j − 1 values A2 , A4 , . . . , A2 j−2 are known, then all Ai
are uniquely determined.
The power moments (P1 ) of Section 7.2 with Ai = Ai⊥ can be viewed (by moving the
right-hand side to the left) as a homogeneous system of linear equations. We have the
additional linear homogeneous equations Ai − An−i = 0 for all i and Ai = 0 for i odd.
Consider all solutions of this combined system over the rational numbers Q. The solutions
to this combined system form a subspace of Qn+1 , and by the above argument the dimension
of this subspace is j = ⌈(n + 2)/8⌉ (we do not assume that A0 = 1 but allow A0 to be
any rational number). By Exercise 495, j = ⌊n/8⌋ + 1. We give another complete set of
n
solutions of this same system. By Theorem 7.2.2, for 0 ≤ i ≤ ⌊n/8⌋, g1 (x, y) 2 −4i g2 (x, y)i
is the weight enumerator of the direct sum of (n/2) − 4i copies of the [2, 1, 2] binary
3 ; this direct sum is a self-dual code of length n. Thus
repetition code and i copies of H
each of these polynomials
leads to a solution of our homogeneous system. Therefore the set
⌊ n8 ⌋
n
of polynomials i=0
ai g1 (x, y) 2 −4i g2 (x, y)i where ai ∈ Q also leads to solutions of this
n
system. By Exercise 496, {g1 (x, y) 2 −4i g2 (x, y)i | 0 ≤ i ≤ ⌊n/8⌋} is a linearly independent
set of j = ⌊n/8⌋ + 1 solutions, and hence all solutions of our homogeneous system can be
derived from rational combinations of this set, proving part (i). Note that i ai = 1 comes
from the requirement that A0 = 1.
Exercise 495 Show that if n is an even positive integer then
#
"
$n %
n+2
.
−1=
8
8
Exercise 496 With the notation of Gleason’s Theorem do the following:
(a) Prove that if we let g2′ (x, y) = x 2 y 2 (x 2 − y 2 )2 , then we can use g2′ (x, y) in place of
g2 (x, y) by showing that given ai ∈ Q (or bi ∈ Q), there exist bi ∈ Q (or ai ∈ Q) such
that
⌊ n8 ⌋
i=0
ai g1 (x, y)
n
2 −4i
i
g2 (x, y) =
⌊ n8 ⌋
n
bi g1 (x, y) 2 −4i g2′ (x, y)i .
i=0
n
2 −4i
(b) Prove that {g1 (x, y)
g2′ (x, y)i | 0 ≤ i ≤ ⌊n/8⌋} is linearly independent over Q.
n
−4i
(c) Prove that {g1 (x, y) 2 g2 (x, y)i | 0 ≤ i ≤ ⌊n/8⌋} is linearly independent over Q.
The polynomials g1 , . . . , g7 presented in Gleason’s Theorem are all weight enumerators
of certain self-dual codes and are called Gleason polynomials. The polynomials g1 and g2
were discussed earlier. The weight enumerators of the extended binary and ternary Golay
codes are g3 and g5 , by Exercise 384 and Example 7.3.2, respectively. The tetracode of
Example 1.3.3 has weight enumerator g4 ; see Exercise 19. The repetition code of length 2
over F4 has weight enumerator g6 , and the hexacode, given in Example 1.3.4, has weight
enumerator g7 ; see Exercises 19 and 391.
Exercise 497 Prove parts (ii), (iii), and (iv) of Gleason’s Theorem. Note that in parts
(iii) and (iv), you cannot assume that Ai = An−i . In part (ii) it may be helpful, as in
343
9.2 Gleason polynomials
Exercise 496, to replace g3 (x, y) by g3′ (x, y) = x 4 y 4 (x 4 − y 4 )4 ; in part (iii), you can replace
g5 (x, y) by g5′ (x, y) = x 3 (x 3 − y 3 )3 ; and in part (iv), you can replace g7 (x, y) by g7′ (x, y) =
x 2 (x 2 − y 2 )2 .
The proof of parts (i), (ii), and (iii) of Gleason’s Theorem is found in [19]. There are other
proofs. One of the nicest involves the use of invariant theory. The proof of part (iv) using
invariant theory is found in [215, 217]. This approach has been applied to many other types
of weight enumerators, such as complete weight enumerators, and to weight enumerators of
other families of codes. In the proof, one can show that the weight enumerator of the code
under consideration is invariant under a finite group of matrices, the specific group depending
on the family of codes. This group includes a matrix directly associated to the MacWilliams
equations (M3 ) of Section 7.2. Once this group is known, the theory of invariants allows
one to write a power series associated to a polynomial ring determined by this group. After
this power series is computed, one can look for polynomials of degrees indicated by this
power series to use in a Gleason-type theorem. Unfortunately, the development of invariant
theory required for this is too extensive for this book. Readers interested in this approach
should consult [215, 218, 291, 313].
Exercise 498 Show that the possible weight distributions of an even formally self-dual
binary code of length n = 8 are:
A0 = A8
A2 = A6
A4
1
0
14
1
1
12
1 1 1 1 1 1
2 3 4 5 6 7
10 8 6 4 2 0
Exercise 499 Show that only two of the weight distributions in Exercise 498 actually arise
as distributions of self-dual binary codes of length 8.
Exercise 500 Make a table as in Exercise 498 of the possible weight distributions of an
even formally self-dual binary code of length n = 10.
Exercise 501 Do the following:
(a) Using the table from Exercise 500 give the weight distribution of a [10, 5, 4] even
formally self-dual binary code.
(b) Find a [10, 5, 4] even formally self-dual binary code. (By Example 9.4.2, this code will
not be self-dual.)
Exercises 498 and 500 illustrate how Gleason’s Theorem can be used to find possible
weight distributions of even formally self-dual binary codes. These exercises together with
Exercises 499 and 501 indicate that a possible weight distribution arising from Gleason’s
Theorem may not actually occur as the weight distribution of any code, or it may arise as
the weight distribution of an even formally self-dual code but not of a self-dual code.
The following is an immediate consequence of Gleason’s Theorem.
344
Self-dual codes
Corollary 9.2.2 Self-dual doubly-even binary codes of length n exist if and only if 8 | n;
self-dual ternary codes of length n exist if and only if 4 | n; and Hermitian self-dual codes
over F4 of length n exist if and only if n is even.
Exercise 502 Prove Corollary 9.2.2.
Note that Theorem 9.1.3 also implies that self-dual codes over F3 of length n exist if
and only if 4 | n; see Example 9.1.4. Hermitian self-dual codes over F4 of length n have
only even weight codewords by Theorem 1.4.10; therefore by Exercise 383 these codes,
being self-dual, have a codeword of weight n, and hence n must be even giving an alternate
verification of part of Corollary 9.2.2. This argument also shows that every Hermitian
self-dual code over F4 is equivalent to one that contains the all-one codeword.
9.3
Upper bounds
We can use Gleason’s Theorem to determine upper bounds on the minimum distance of
the codes arising in that theorem. We present some of these bounds in this section and give
improvements where possible.
As an illustration, consider the even formally self-dual binary codes of length n = 12 that
arise in (i) of Gleason’s Theorem. The weight enumerator of such a code C must have the
form WC (x, y) = a1 g1 (x, y)6 + (1 − a1 )g1 (x, y)2 g2 (x, y). By examining the coefficient of
x 2 y 10 , we have
1
1
1
3
WC (x, y) =
A2 −
g1 (x, y)6 + − A2 +
g1 (x, y)2 g2 (x, y),
(9.1)
4
2
4
2
and hence WC (x, y) is completely determined when A2 is known. If A2 = 0,
WC (x, y) = y 12 + 15x 4 y 8 + 32x 6 y 6 + 15x 8 y 4 + x 12 .
(9.2)
This argument shows that a [12, 6] self-dual binary code has minimum distance at most 4.
A generalization of this argument gives a bound on the minimum distance of the codes in
Gleason’s Theorem.
Exercise 503 Verify (9.1) and (9.2).
Theorem 9.3.1 Let C be an [n, n/2, d] code over Fq , for q = 2, 3, or 4.
(i) If q = 2 and C is formally self-dual and even, then d ≤ 2 ⌊n/8⌋ + 2.
(ii) If q = 2 and C is self-dual doubly-even, then d ≤ 4 ⌊n/24⌋ + 4.
(iii) If q = 3 and C is self-dual, then d ≤ 3 ⌊n/12⌋ + 3.
(iv) If q = 4 and C is Hermitian self-dual, then d ≤ 2 ⌊n/6⌋ + 2.
In all cases, if equality holds in the bounds, the weight enumerator of C is unique.
These bounds were proved in [217, 223], and we omit the proofs. However, we give the
major idea used in the proof. Consider a Type II code C and assume that Ai (C) = 0 for
1 ≤ i < 4 ⌊n/24⌋ + 4. In our proof of Gleason’s Theorem (which is a model for the Type
II case you were asked to prove in Exercise 497), it is shown that the weight enumerator
WC (x, y) of C is uniquely determined. In this polynomial, the value of A4⌊n/24⌋+4 can be
345
9.3 Upper bounds
determined explicitly; this value is positive and so no Type II code of higher minimum
weight could exist. A similar argument works in the other cases.
We turn now to the question of when these bounds can be met. This question has intrigued
researchers for many years. Suppose first that C is a Type II or Type IV code with minimum
weight meeting the appropriate bound of Theorem 9.3.1(ii) or (iv). The weight enumerator
is unique. If the length n is large enough, this weight enumerator has a negative coefficient,
which is clearly not possible. When searching for negative coefficients, a natural coefficient
to examine is the first nonzero one after Ad , where d is the minimum weight. The exact
situation is described by the following theorem found in [291].
Theorem 9.3.2 Let WC (x, y) be the weight enumerator of a self-dual code C of length n of
Type II or IV meeting bounds (ii) or (iv), respectively. Assume that n is divisible by 8 or 2,
respectively. The following hold, where Am is the coefficient of y m x n−m in WC (x, y):
r Type II: If n = 24i with i ≥ 154, n = 24i + 8 with i ≥ 159, or n = 24i + 16 with i ≥
164, then Am < 0 for m = 4 ⌊n/24⌋ + 8. In particular, C cannot exist for n > 3928.
r Type IV: If n = 6i with i ≥ 17, n = 6i + 2 with i ≥ 20, or n = 6i + 4 with i ≥ 22, then
Am < 0 for m = 2 ⌊n/6⌋ + 4. In particular, C cannot exist for n > 130.
This theorem only shows where one particular coefficient of WC (x, y) is negative. Of
course other coefficients besides Am given above may be negative eliminating those values
of n. The following is known about the Type III case; see [291].
Theorem 9.3.3 Let C be a Type III code of length n and minimum distance d =
3 ⌊n/12⌋ + 3. Then C does not exist for n = 72, 96, 120, and all n ≥ 144.
The proof of this result involves showing that the weight enumerator of a Type III code
meeting the bound of Theorem 9.3.1(iii) has a negative coefficient. J. N. Pierce was the
first to use arguments on weight enumerators to show the nonexistence of codes when he
showed that a Type III [72, 36,21] code, which meets the bound of Theorem 9.3.1(iii), does
not exist.
We now turn to the bound of Theorem 9.3.1(i). In [223] it was discovered that the weight
enumerator of an even formally self-dual binary code of length n meeting the bound of
Theorem 9.3.1(i) has a negative coefficient for n = 32, 40, 42, 48, 50, 52, and n ≥ 56.
This information allowed Ward to show in [344] that the only values of n for which selfdual codes exist meeting the bound are n = 2, 4, 6, 8, 12, 14, 22, and 24. There are also
even formally self-dual binary codes of lengths n = 10, 18, 20, 28, and 30 that meet the
bound, but no higher values of n; see [82, 83, 156, 165, 309]. (We note that for even n with
32 ≤ n ≤ 54, there are no [n, n/2, 2 ⌊n/8⌋ + 2] binary codes, self-dual or not, by [32].) We
summarize this in the following theorem.
Theorem 9.3.4 There exist self-dual binary codes of length n meeting the bound of Theorem 9.3.1(i) if and only if n = 2, 4, 6, 8, 12, 14, 22, and 24. There exist even formally
self-dual binary codes of length n meeting the bound of Theorem 9.3.1(i) if and only if n
is even with n ≤ 30 and n = 16 and n = 26. For all these codes, the weight enumerator is
uniquely determined by the length.
The bound in Theorem 9.3.1(i) can be significantly improved for self-dual codes. The
following theorem is due to Rains [290] and shows that the bound for Type II codes is
346
Self-dual codes
almost the bound for Type I codes also; its proof makes use of the shadow introduced in
the next section.
Theorem 9.3.5 Let C be an [n, n/2, d] self-dual binary code. Then d ≤ 4 ⌊n/24⌋ + 4 if n ≡
22 (mod 24). If n ≡ 22 (mod 24), then d ≤ 4 ⌊n/24⌋ + 6. If 24 | n and d = 4 ⌊n/24⌋ + 4,
then C is Type II.
In Theorem 9.4.14, we will see that if n ≡ 22 (mod 24) and d = 4 ⌊n/24⌋ + 6, then C
can be obtained from a Type II [n + 2, (n/2) + 1, d + 2] code in a very specific way.
A Type II, III, or IV code meeting the bound of Theorem 9.3.1(ii), (iii), or (iv), respectively, is called an extremal Type II, III, or IV code. A Type I code meeting the bound of
Theorem 9.3.5 is called an extremal Type I code.2 Note that the minimum distance for an
extremal Type I code of length a multiple of 24 is 2 less than the Type II bound; that is
d = (n/6) + 2 is the minimum weight of an extremal Type I code of length n a multiple of
24. The formally self-dual codes of Theorem 9.3.4 are also called extremal. Extremal codes
of Type II, III, and IV all have unique weight enumerators and exist only for finitely many
lengths, as described above. Those of Type I of length exceeding 24 do not necessarily have
unique weight enumerators; it has not been proved that these exist for only finitely many
lengths, but that is likely the case.
We summarize what is known about the Type I and Type II codes in Table 9.1 for length
n with 2 ≤ n ≤ 72. In the table, “d I ” is the largest minimum weight for which a Type I code
is known to exist while “d I I ” is the largest minimum weight for which a Type II code is
known to exist. The superscript “E” indicates that the code is extremal; the superscript “O”
indicates the code is not extremal but optimal – that is, no code of the given type can exist
with a larger minimum weight. The number of inequivalent Type I and II codes of the given
minimum weight is listed under “num I ” and “num I I ”, respectively. When the number in
that column is exact (without ≥), the classification of those codes is complete; some of the
codes that arise are discussed in Section 9.7. An entry in the column “d I ” or “d I I ” such as
“10 (12 E )” followed by a question mark in the next column indicates that there is a code
known of the smaller minimum weight but there is no code known for the higher minimum
weight. This table is taken from [291, Table X], where references are also provided, with
updates from [30, 168].
The [8, 4, 4] extended Hamming code is the unique Type II code of length 8 indicated in
Table 9.1. Two of the five [32, 16, 8] Type II codes are extended quadratic residue and Reed–
Muller codes. The [24, 12, 8] and [48, 24, 12] binary extended quadratic residue codes are
extremal Type II as exhibited in Example 6.6.23. However, the binary extended quadratic
residue code of length 72 has minimum weight 12 and hence is not extremal. The first two
lengths for which the existence of an extremal Type II code is undecided are 72 and 96. For
lengths beyond 96, extremal Type II codes are known to exist only for lengths 104 and 136.
Research Problem 9.3.6 Either construct or establish the nonexistence of a [56, 28, 12],
2
Before the bound of Theorem 9.3.5 was proved, extremal codes of Type I were those that met the bound of
Theorem 9.3.1(i). The older bound is actually better than the newer bound for n ≤ 8. There was an intermediate
bound found in Theorem 1 of [58] that is also occasionally better than the bound of Theorem 9.3.5 for some
n ≤ 68, but not for any larger n.
347
9.3 Upper bounds
Table 9.1 Type I and II codes of length 2 ≤ n ≤ 72
n
dI
num I
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
2O
2O
2O
2O
2O
4E
4E
4E
4E
4E
6E
6E
6O
6O
6O
8E
6O
8E
1
1
1
1
2
1
1
1
2
7
1
1
1
3
13
3
≥200
≥14
dI I
4E
4E
8E
8E
num I I
1
2
1
5
n
dI
num I
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
8E
8E
8E
8E
10 E
10 E
10 O
10 O
10 O
10 (12 E )
10 O
12 E
12 E
12 E
12 E
12 E
12 (14 E )
12 (14 E )
≥368
≥22
≥30
≥108
≥1
≥7
≥6
≥499
≥54
?
≥101
≥5
≥1
≥5
≥3
≥65
?
?
dI I
num I I
8E
≥1000
12 E
≥1
12 E
≥166
12 E
≥3270
12 (16 E )
?
a [70, 35, 14], and a [72, 36, 14] Type I code; do the same for a [72, 36, 16] and a [96, 48,
20] Type II code.
Self-dual codes with an automorphism of certain prime orders have either been classified
or shown not to exist. For example, it was shown in [54, 152, 265, 284] that there are no
[72, 36, 16] Type II codes with an automorphism of prime order p > 7. Along the same
lines, it is known (see [142]) that the only [48, 24, 12] Type II code with an automorphism
of odd order is an extended quadratic residue code. By a long computer search [140, 141], it
was shown that no other Type II code of length 48 exists. This leads to the following problem.
Research Problem 9.3.7 Prove without a lengthy computer search that the [48, 24, 12]
Type II code is unique.
By Theorem 9.3.3, extremal Type III codes of length n cannot exist for n = 72, 96, 120,
and all n ≥ 144. The smallest length for which the existence of an extremal Type III code
is undecided is length n = 52. Extremal Type III codes of length n are known to exist for
all n ≡ 0 (mod 4) with 4 ≤ n ≤ 64 except n = 52; all other lengths through 140 except
72, 96, and 120 remain undecided. Table 9.2 summarizes this information. In the table,
codes have length “n”; “dknown ” is the largest minimum weight for which a Type III code
is known to exist. The superscripts “E” and “O” and the notation “12 (15 E )” followed by a
question mark are as in Table 9.1. The number of inequivalent Type III codes of the given
minimum weight is listed under “num”; again those values given exactly indicate that the
classification is complete. This table is taken from [291, Table XII], where references are
provided.
348
Self-dual codes
Table 9.2 Type III codes of length 4 ≤ n ≤ 72
n
dknown
num
n
dknown
num
n
dknown
num
4
8
12
16
20
24
3E
3E
6E
6E
6E
9E
1
1
1
1
6
2
28
32
36
40
44
48
9E
9E
12 E
12 E
12 E
15 E
≥32
≥239
≥1
≥20
≥8
≥2
52
56
60
64
68
72
12 (15 E )
15 E
18 E
18 E
15 (18 E )
18 O
?
≥1
≥2
≥1
?
≥1
Table 9.3 Type IV codes of length 2 ≤ n ≤ 30
n
dknown
num
n
dknown
num
n
dknown
num
2
4
6
8
10
2E
2E
4E
4E
4E
1
1
1
1
2
12
14
16
18
20
4O
6E
6E
8E
8E
5
1
4
1
2
22
24
26
28
30
8E
8O
8 (10 E )
10 E
12 E
≥46
≥17
?
≥3
≥1
The [4, 2, 3] tetracode is the unique Type III code of length 4, while the [12, 6, 6] extended
ternary Golay code is the unique Type III code of length 12. The two extremal Type III
codes of length 24 are an extended quadratic residue code and a symmetry code, described
in Section 10.5. The only known extremal Type III code of length 36 is a symmetry code,
the only two known of lengths 48 and 60 are extended quadratic residue and symmetry
codes, and the only one known of length 72 is an extended quadratic residue code.
Research Problem 9.3.8 Either construct or establish the nonexistence of any of the extremal Type III codes of lengths n = 52, 68, or 76 ≤ n ≤ 140 (except 96 and 120), where
4 | n.
By Theorem 9.3.2, extremal Type IV codes of even length n cannot exist if n > 130; they
are also known not to exist for lengths 12, 24, 102, 108, 114, 120, 122, 126, 128, and 130.
They are known to exist for lengths 2, 4, 6, 8, 10, 14, 16, 18, 20, 22, 28, and 30. Existence
is undecided for all other lengths, the smallest being length 26. It is known [144, 145] that
no [26, 13, 10] Type IV code exists with a nontrivial automorphism of odd order. Table 9.3
summarizes some of this information. Again, this table is taken from [291, Table XIII],
where references are also provided, with updates from [167]; the notation is as in Table 9.2.
In Table 9.3, the unique [6, 3, 4] Type IV code is the hexacode, the unique [14, 7, 6] code
is an extended quadratic residue code, and the unique [18, 9, 8] code is an extended duadic
(nonquadratic residue) code. The only known extremal Type IV code of length 30 is an
extended quadratic residue code.
Research Problem 9.3.9 Either construct or establish the nonexistence of any of the extremal Type IV codes of even lengths n = 26 or 32 ≤ n ≤ 124 (except 102, 108, 114, 120,
and 122).
349
9.3 Upper bounds
The extremal codes of Types II, III, and IV hold t-designs as a consequence of the
Assmus–Mattson Theorem.
Theorem 9.3.10 The following results on t-designs hold in extremal codes of Types II, III,
and IV.
(i) Let C be a [24m + 8µ, 12m + 4µ, 4m + 4] extremal Type II code for µ = 0, 1, or 2.
Then codewords of any fixed weight except 0 hold t-designs for the following parameters:
(a) t = 5 if µ = 0 and m ≥ 1,
(b) t = 3 if µ = 1 and m ≥ 0, and
(c) t = 1 if µ = 2 and m ≥ 0.
(ii) Let C be a [12m + 4µ, 6m + 2µ, 3m + 3] extremal Type III code for µ = 0, 1, or 2.
Then codewords of any fixed weight i with 3m + 3 ≤ i ≤ 6m + 3 hold t-designs for
the following parameters:
(a) t = 5 if µ = 0 and m ≥ 1,
(b) t = 3 if µ = 1 and m ≥ 0, and
(c) t = 1 if µ = 2 and m ≥ 0.
(iii) Let C be a [6m + 2µ, 3m + µ, 2m + 2] extremal Type IV code for µ = 0, 1, or 2.
Then codewords of any fixed weight i with 2m + 2 ≤ i ≤ 3m + 2 hold t-designs for
the following parameters:
(a) t = 5 if µ = 0 and m ≥ 2,
(b) t = 3 if µ = 1 and m ≥ 1, and
(c) t = 1 if µ = 2 and m ≥ 0.
Proof: We prove (i)(a) and (ii)(a), leaving the remainder as an exercise. Suppose C is a
[24m, 12m, 4m + 4] Type II code with weight distribution Ai = Ai (C). As A24m−i = Ai and
Ai = 0 if 4 ∤ i, then {i | 0 < i < 24m,Ai = 0} has size at most s = 4m − 1. As 5 < 4m + 4
and s ≤ (4m + 4) − 5, (i)(a) follows from the Assmus–Mattson Theorem.
Suppose now that C is a [12m, 6m, 3m + 3] Type III code. The largest integer w
satisfying
w+1
w−
< 3m + 3
2
is w = 6m + 5. As Ai = 0 if 3 ∤ i, then {i | 0 < i < 12m − 5, Ai = 0} has size at most s =
3m − 2. As 5 < 3m + 3 and s ≤ (3m + 3) − 5, (ii)(a) follows from the Assmus–Mattson
Theorem.
Exercise 504 Complete the remaining parts of the proof of Theorem 9.3.10.
Janusz has generalized Theorem 9.3.10(i) in [157].
Theorem 9.3.11 Let C be a [24m + 8µ, 12m + 4µ, 4m + 4] extremal Type II code for
µ = 0, 1, or 2, where m ≥ 1 if µ = 0. Then either:
(i) the codewords of any fixed weight i = 0 hold t-designs for t = 7 − 2µ, or
(ii) the codewords of any fixed weight i = 0 hold t-designs for t = 5 − 2µ and there is no
i with 0 < i < 24m + 8µ such that codewords of weight i hold a (6 − 2µ)-design.
350
Self-dual codes
Currently there are no known extremal Type II codes where Theorem 9.3.11(ii) is false.
Applying this result to extremal Type II codes of length n = 24m, we see that codewords
of any fixed weight i = 0 hold 5-designs, and if the codewords of some fixed weight i with
0 < i < n hold a 6-design, then the codewords of any fixed weight i = 0 hold 7-designs.
We comment further about this at the end of the section.
In Example 8.4.6 we found the coset weight distribution of the [24, 12, 8] extended
binary Golay code with the assistance of the designs. The same technique can be used for
any extremal Type II code to compute the coset weight distribution of any coset of weight
t or less, where t is given in Theorem 9.3.10. By Theorem 7.5.2, the weight distribution
of a coset is often uniquely determined once a few weights are known. The values of
j
the intersection numbers λi for the t-designs will enable one to do the calculations as in
Example 8.4.6.
Exercise 505 We will see later that there are five inequivalent Type II [32, 16, 8] codes.
Let C be such a code. The weight enumerator of C is
WC (x, y) = y 32 + 620x 8 y 24 + 13 888x 12 y 20 + 36 518x 16 y 16
+ 13 888x 20 y 12 + 620x 24 y 8 + x 32 .
By Theorem 9.3.10(i)(b), the codewords of fixed nonzero weight hold 3-designs.
(a) Find the Pascal triangle for the 3-designs held by codewords of C of weights 8, 12,
16, 20, and 24. Note that the designs held by the weight 24 and weight 20 codewords
are the complements of the designs held by the weight 8 and weight 12 codewords,
respectively.
(b) Compute the coset weight distributions for the cosets of C of weights 1, 2, and 3.
(c) How many cosets of weights 1, 2, and 3 are there?
In [176] the following result is presented; it is due to Venkov. It shows that there is
surprising additional design structure in extremal Type II codes.
Theorem 9.3.12 Let C be an extremal Type II code of length n = 24m + 8µ, with µ = 0,
1, or 2. Let M be the set of codewords of minimum weight 4m + 4 in C. For an arbitrary
vector a ∈ Fn2 and any j, define
u j (a) = |{x ∈ M | |supp(a) ∩ supp(x)| = j}|.
Also define k5 = ( 5m−2
), k3 = ( 5m
), and k1 = 3( 5m+2
). Let t = 5 − 2µ. If wt(a) = t + 2,
m−1
m
m
then u t+1 (a) + (t − 4)u t+2 (a) = kt , independent of the choice of a.
Example 9.3.13 Let C be the [24, 12, 8] extended binary Golay code. Then in Theorem 9.3.12, m = 1, µ = 0, t = 5, and k5 = 1. The theorem states that any vector a ∈ F24
2
of weight 7 satisfies precisely one of the following: there is exactly one minimum weight
codeword c whose support contains the support of a or meets it in six coordinates. By
Theorem 9.3.10, the codewords of weight 8 hold a 5-design; this was a design we discussed
in Chapter 8, and its intersection numbers are given in Figure 8.3. While this design is not a
7-design (or even a 6-design), it does have additional design-like properties. Theorem 9.3.12
351
9.4 The Balance Principle and the shadow
implies that any set of points of size 7 in this design is either contained in exactly one block
or meets exactly one block in six points but not both.
Extremal self-dual codes have provided many important designs, particularly 5-designs.
As Theorem 9.3.12 established, these designs have additional structure that almost turns a
t-design into a (t + 2)-design. Theorem 9.3.11 indicates that if you find a (t + 1)-design,
you find (t + 2)-designs. However, no one has even found nontrivial 6-designs in codes,
and for a long time it was conjectured that no 6-designs existed. That conjecture has been
disproved and nontrivial t-designs exist for all t [326].
Exercise 506 Let C be a [48, 24, 12] binary extended quadratic residue code; see
Example 6.6.23. By Theorem 9.3.10, all codewords of fixed weights k hold 5-(48, k, λ)
designs. The weight distribution of C is A0 = A48 = 1, A12 = A36 = 17 296, A16 = A32 =
535 095, A20 = A28 = 3 995 376, and A24 = 7 681 680.
(a) Give the parameter λ for the 5-(48, k, λ) design, with k = 12, 16, 20, 24, 28, 32, and
36, held by the codewords in C.
(b) Prove that none of the designs in (a) can be 6-designs.
(c) What does Theorem 9.3.12 imply about the intersection of sets of points of size 7 with
blocks of the 5-(48, 12, λ) design from (a)?
9.4
The Balance Principle and the shadow
Because extremal codes have the highest possible minimum weight for codes of their type
and those of Types II, III, and IV hold designs, a great deal of effort has gone into either
constructing these codes or showing they do not exist. Various arguments, including those
involving the Balance Principle and the shadow code, have been used. We begin with the
Balance Principle, which is also useful in gluing together self-orthogonal codes as we will
see in Section 9.7.
Theorem 9.4.1 ([177, 273]) Let C be a self-dual [n, n/2] code. Choose a set of coordinate
positions P n 1 of size n 1 and let P n 2 be the complementary set of coordinate positions of
size n 2 = n − n 1 . Let C i be the subcode of C all of whose codewords have support in P ni .
The following hold:
(i) (Balance Principle)
n1
n2
= dim C 2 − .
dim C 1 −
2
2
(ii) If we reorder coordinates so that P n 1 is the left-most n 1 coordinates and P n 2 is the
right-most n 2 coordinates, then C has a generator matrix of the form
A O
G = O B ,
(9.3)
D E
where [A O] is a generator matrix of C 1 and [O B] is a generator matrix of C 2 , O
being the appropriate size zero matrix. We also have:
352
Self-dual codes
(a) If ki = dim C i , then D and E each have rank (n/2) − k1 − k2 .
(b) Let A be the code of length n 1 generated by A, A D the code of length n 1 generated
by the rows of A and D, B the code of length n 2 generated by B, and B E the code
of length n 2 generated by the rows of B and E. Then A⊥ = A D and B ⊥ = B E .
Proof: We first prove (ii). By definition of C i , a generator matrix for C can be found as
given, where A and B have k1 and k2 rows, respectively. Also, as C i is the maximum size
code all of whose codewords have support in P ni , and the rows of G are independent,
(ii)(a) must hold; otherwise, the size of C 1 or C 2 could be increased. The (n/2) − k2 rows
of A and D must be independent as otherwise we can again increase the size of C 2 . Thus
dim A D = (n/2) − k2 . Furthermore, as C is self-dual, A ⊆ A⊥D implying
n
k1 +
− k2 ≤ n 1 .
(9.4)
2
Similarly, dim B E = (n/2) − k1 , B ⊆ B ⊥E , and
n
k2 +
− k1 ≤ n 2 .
2
(9.5)
Adding (9.4) and (9.5), we obtain n ≤ n 1 + n 2 . Since we have equality here, we must have
equality in both (9.4) and (9.5) implying (ii)(b). Part (i) is merely the statement that equality
holds in (9.4).
Example 9.4.2 In this example, we show there is no [10, 5, 4] self-dual binary code; such
a code would meet the bound of Theorem 9.3.1(i). Suppose C is such a code, where we
write a generator matrix for C in form (9.3) with the first n 1 coordinates being the support
of a weight 4 codeword. Then A has one row, and by the Balance Principle, B has two
rows. As C contains the all-one codeword of length 10, the matrix B generates a [6, 2, 4]
self-orthogonal code containing the all-one vector of length 6. This is clearly impossible
and C does not exist.
Exercise 507 Let C be an [18, 9] Type I code.
(a) Show that the bound on the minimum weight of C in Theorem 9.3.1(i) is d ≤ 6.
(b) Show that there is no [18, 9, 6] Type I code.
Example 9.4.3 In Research Problem 9.3.7, we pose the question of finding all [48, 24, 12]
codes Type II C without a lengthy computer search. By permuting coordinates, we can
assume C has generator matrix G in the form (9.3), where n 1 = n 2 = 24 and the first n 1
coordinates are the support of a weight 24 codeword. Koch proved in [176] that 1 ≤ k1 ≤ 4.
It can be shown that every weight 12 codeword has support disjoint from the support of
another weight 12 codeword and so, using the sum of these codewords as our weight 24
codeword, we may assume that 2 ≤ k1 ≤ 4. In [140], a lengthy computer search showed
that if k1 = 4, then C is the extended quadratic residue code. The values k1 = 2 and k1 = 3
were eliminated by another long computer search in [141].
Let C be an [n, n/2] Type I code; C has a unique [n, (n/2) − 1] subcode C 0 consisting
of all codewords in C whose weights are multiples of four; that is, C 0 is the doubly-even
subcode of C. Thus the weight distribution of C 0 is completely determined by the weight
353
9.4 The Balance Principle and the shadow
distribution of C. Hence the weight distribution of C ⊥
0 is determined by the MacWilliams
equations. It can happen that the latter has negative or fractional coefficients even though
the weight distribution of C does not, implying that C does not exist. This occurs for both
[10, 5, 4] and [18, 9, 6] Type I codes and other Type I codes [58].
We consider the general situation of an [n, n/2] self-dual binary code C; choose a fixed
[n, (n/2) − 1] subcode C 0 , which is not necessarily a doubly-even subcode. Thus the code
C⊥
0 contains both C 0 and C, as C is self-dual, and has dimension (n/2) + 1 implying that
⊥
there are four cosets of C 0 in C ⊥
0 . Let C 0 = C 0 ∪ C 1 ∪ C 2 ∪ C 3 , where C i are these cosets of
⊥
C 0 in C 0 . Two of these cosets, including C 0 , must be the two cosets of C 0 in C. We number
the cosets so that C = C 0 ∪ C 2 ; so C 1 ∪ C 3 is a coset of C in C ⊥
0 . If C is Type I and C 0 is
the doubly-even subcode of C, then C 1 ∪ C 3 is called the shadow of C [58]. The following
lemma will give us a great deal of information about the cosets C i .
Lemma 9.4.4 Let C be an [n, n/2] self-dual binary code with C 0 an [n, (n/2) − 1] subcode.
The following hold:
(i) If {i, j, k} = {1, 2, 3}, then the sum of a vector in C i plus a vector in C j is a vector
in C k .
(ii) For i, j ∈ {0, 1, 2, 3}, either every vector in C i is orthogonal to every vector in C j or
no vector in C i is orthogonal to any vector in C j .
The proof is left as an exercise. We give notation to the two possibilities in
Lemma 9.4.4(ii). If every vector in C i is orthogonal to every vector in C j , we denote
this by C i ⊥C j ; if no vector in C i is orthogonal to any vector in C j , we denote this by C i /C j .
Notice that by definition C 0 ⊥C i for all i.
Exercise 508 Prove Lemma 9.4.4.
Example 9.4.5 Let C
1 1 0 0
G = 0 0 1 1
0 0 0 0
be the [6, 3, 2] self-dual binary code with generator matrix
0 0
0 0.
1 1
Let C 0 be the doubly-even subcode of C. Then a generator matrix for C ⊥
0 is
1 1 1 1 0 0
0 0 1 1 1 1
,
G⊥
0 =
1 1 0 0 0 0
1 0 1 0 1 0
where the first two rows generate C 0 . Let c1 = 101010, c2 = 110000, and c3 = c1 + c2 =
011010. Then the cosets C i of C 0 in C ⊥
0 for 1 ≤ i ≤ 3 are C i = ci + C 0 . Notice that the
shadow C 1 ∪ C 3 is the coset c1 + C of C in C ⊥
0 ; it consists of all the odd weight vectors in
C⊥
0 . Also 1 ∈ C 2 .
Exercise 509 With C and C i as in Example 9.4.5, decide whether C i ⊥C j or C i /C j for all
i and j (including i = j).
354
Self-dual codes
Table 9.4 Orthogonality for an even
weight coset of a self-dual code
C0
C2
C1
C3
C0
C2
C1
C3
⊥
⊥
⊥
⊥
⊥
⊥
/
/
⊥
/
⊥
/
⊥
/
/
⊥
Exercise 510 Let C be an [n, k] self-orthogonal code over Fq . Let v ∈ Fqn with v ∈
C ⊥ . Show that C 0 = {c ∈ C | c · c1 = 0 for all c1 ∈ v + C} is a subcode of C of dimension
k − 1.
Exercise 511 Let C be a self-dual binary code of length n. Show that there is a one-to-one
correspondence between the subcodes of C of dimension (n/2) − 1 and the cosets of C.
Hint: See Exercise 510.
In general, we can choose C 0 to be any [n, (n/2) − 1] subcode of C and obtain the four
′
′ ⊥
cosets C i of C 0 in C ⊥
0 . Alternately, we can choose a coset C of C and let C 0 = (C ∪ C ) .
⊥
′
Then C 0 = C 0 ∪ C 1 ∪ C 2 ∪ C 3 where C = C 0 ∪ C 2 and C = C 1 ∪ C 3 .
Two possibilities arise for the weights of the cosets C i depending on whether or not
1 ∈ C 0 . Recall from Exercise 72 that a coset of C, such as C 1 ∪ C 3 , has all even weight
vectors or all odd weight vectors. If 1 ∈ C 0 , then all codewords in C ⊥
0 must be even weight,
implying that C 1 ∪ C 3 has only even weight vectors, or equivalently, C 1 ∪ C 3 is an even
weight coset of C. Notice that, in this case, C 0 ∪ C 1 and C 0 ∪ C 3 are self-dual codes. If
⊥
1 ∈ C 0 , then C ⊥
0 is not an even code and C must be the even subcode of C 0 . Thus C 1 ∪ C 3
has only odd weight vectors, or equivalently, C 1 ∪ C 3 is an odd weight coset of C. We
summarize this in the following lemma.
Lemma 9.4.6 Let C be an [n, n/2] self-dual binary code with C 0 an [n, (n/2) − 1] subcode.
The following are the only possibilities:
(i) If 1 ∈ C 0 , then C 1 ∪ C 3 is an even weight coset of C and contains only even weight
vectors. Also C 0 ∪ C 1 and C 0 ∪ C 3 are self-dual codes.
(ii) If 1 ∈ C 0 , then 1 ∈ C 2 . Furthermore, C 1 ∪ C 3 is an odd weight coset of C and contains
only odd weight vectors.
We investigate the two possibilities that arise in Lemma 9.4.6. In each case we obtain an
orthogonality table showing whether or not C i is orthogonal to C j . We also obtain additional
information about the weights of vectors in C 1 and C 3 .
Suppose that C 1 ∪ C 3 is an even weight coset of C. Then Table 9.4 gives the orthogonality
relationships between the cosets [38]; these follow from Lemma 9.4.4, with verification left
as an exercise.
Exercise 512 Verify the entries in Table 9.4.
We give additional information about the cosets that arise in Table 9.4.
355
9.4 The Balance Principle and the shadow
Theorem 9.4.7 Let C be an [n, n/2] self-dual binary code with C 0 an [n, (n/2) − 1] subcode. Suppose that 1 ∈ C 0 . The following hold:
(i) If C is Type II, C 1 and C 3 can be numbered so that all vectors in C 1 have weight 0
modulo 4, and all vectors in C 3 have weight 2 modulo 4.
(ii) If C and C 0 are singly-even, then half the vectors in C i are singly-even and half are
doubly-even for i = 1 and 3.
(iii) (Shadow case) If C is singly-even and C 0 is the doubly-even subcode of C and if
c ∈ C 1 ∪ C 3 , then wt(c) ≡ n/2 (mod 4). Also n ≡ 0 (mod 8) or n ≡ 4 (mod 8).
Proof: We leave the proofs of (i) and (ii) as an exercise. Suppose that C is singly-even and
C 0 is the doubly-even subcode of C. Let ci ∈ C i . Then c1 + c3 ∈ C 2 by Lemma 9.4.4 implying that wt(c1 + c3 ) ≡ 2 (mod 4). But wt(c1 + c3 ) = wt(c1 ) + wt(c3 ) − 2wt(c1 ∩ c3 ) ≡
wt(c1 ) + wt(c3 ) − 2 (mod 4) as C 1 /C 3 . Since both wt(c1 ) and wt(c3 ) are even, wt(c1 ) ≡
wt(c3 ) (mod 4).
If n ≡ 0 (mod 8), then all weights in C 1 and C 3 must be 2 modulo 4, as otherwise C 0 ∪ C 1
or C 0 ∪ C 3 would be a Type II code, which is a contradiction. Because 1 ∈ C 0 , n ≡ 4 (mod 8),
and (iii) follows in this case. If n ≡ 0 (mod 8), then every [n, (n/2) − 1] doubly-even code
that contains 1, such as C 0 , is contained in one Type I code and two Type II codes; see
Exercise 526. These three codes must be in C ⊥
0 ; since C = C 0 ∪ C 2 is Type I, C 0 ∪ C 1 and
C 0 ∪ C 3 must be the Type II codes. Thus wt(c) ≡ 0 ≡ n/2 (mod 4).
Exercise 513 Prove Theorem 9.4.7(i) and (ii).
Example 9.4.8 It is known [262] that there is a unique [12, 6, 4] Type I code C, up to
equivalence. Let C 0 be the doubly-even subcode of C, and let C 1 ∪ C 3 be the shadow of C.
So C ⊥
0 = C 0 ∪ C 1 ∪ C 2 ∪ C 3 where C = C 0 ∪ C 2 . As can be seen from the generator matrix
for C, the shadow C 1 ∪ C 3 of C is a weight 2 coset of C with six weight 2 vectors in the
coset. Using Table 9.4 and Theorem 9.4.7, the weight distribution of C i is
0
C0
C2
C1
C3
2
1
4
6
15
6
8
10
15
32
20
32
12
1
6
where we assume that C 1 contains a weight 2 vector. C 0 ∪ C 1 is a [12, 6, 2] self-dual
code with a coset of weight 6, while C 0 ∪ C 3 is a [12, 6, 4] self-dual code with a coset of
weight 2.
Exercise 514 Verify that the weight distribution of each C i given in Example 9.4.8 is
correct. Note that you must show that if C 1 contains one weight 2 vector, it contains all six
weight 2 vectors in C 1 ∪ C 3 .
Example 9.4.9 Continuing with Example 9.4.8, let C be a [12, 6, 4] Type I code. It turns
out that there are two types of cosets of C of weight 2. There is a unique coset of weight
2 described in Example 9.4.8. There are 30 other cosets of weight 2, each containing two
vectors of weight 2. Let C ′ be one of these. Let C 0 = (C ∪ C ′ )⊥ . The code C 0 is singly-even
356
Self-dual codes
Table 9.5 Orthogonality for an odd
weight coset of a self-dual code
C0
C2
C1
C3
C0
C2
C1
C3
⊥
⊥
⊥
⊥
⊥
⊥
/
/
⊥
/
/
⊥
⊥
/
⊥
/
and contains 1. Again using Table 9.4 and Theorem 9.4.7, the weight distribution of C i is
given by
C0
C2
C1
C3
0
2
4
6
8
1
7 16 7
8 16 8
2 8 12 8
8 16 8
10
12
1
2
where we assume that C 1 contains a weight 2 vector. C 0 ∪ C 1 is a [12, 6, 2] self-dual
code with a coset of weight 4, while C 0 ∪ C 3 is a [12, 6, 4] self-dual code with a coset of
weight 2.
Exercise 515 Verify that the weight distribution of each C i given in Example 9.4.9 is
correct. As in Exercise 514 you must show that if C 1 contains one weight 2 vector, it
contains both weight 2 vectors in C 1 ∪ C 3 .
We now turn to the second possibility given in Lemma 9.4.6(ii). Table 9.5 gives the
orthogonality relationships between the cosets [38] when C 1 ∪ C 3 contains only odd weight
vectors.
Exercise 516 Verify the entries in Table 9.5.
We have a result analogous to that of Theorem 9.4.7.
Theorem 9.4.10 Let C be an [n, n/2] self-dual binary code with C 0 an [n, (n/2) − 1]
subcode. Suppose that 1 ∈ C 0 . The following hold:
(i) If C is Type II, C 1 and C 3 can be numbered so that all vectors in C 1 have weight 1
modulo 4, and all vectors in C 3 have weight 3 modulo 4.
(ii) If C 0 is singly-even, then, for i = 1 and 3, half the vectors in C i have weight 1 modulo
4 and half the vectors in C i have weight 3 modulo 4.
(iii) (Shadow case) If C is singly-even and C 0 is the doubly-even subcode of C and if
c ∈ C 1 ∪ C 3 , then wt(c) ≡ n/2 (mod 4). Also n ≡ 2 (mod 8) or n ≡ 6 (mod 8).
Proof: We leave the proofs of (i) and (ii) as an exercise. Suppose that C is singly-even and
C 0 is the doubly-even subcode of C. We note that since 1 ∈ C 0 , then 1 ∈ C 2 . As C 2 can only
contain singly-even vectors, either n ≡ 2 (mod 8) or n ≡ 6 (mod 8).
357
9.4 The Balance Principle and the shadow
We construct a new code C ′′ of length n + 2 by adjoining two components to C ⊥
0 where
C ′′ = {c00 | c ∈ C 0 } ∪ {c11 | c ∈ C 2 } ∪ {c10 | c ∈ C 1 } ∪ {c01 | c ∈ C 3 }.
(9.6)
Clearly, C ′′ is an [n + 2, (n/2) + 1] code. By Exercise 517, C ′′ is self-dual.
Consider first the case n ≡ 2 (mod 8). Let ci ∈ C i . Then c1 + c3 ∈ C 2 by Lemma 9.4.4 implying that wt(c1 + c3 ) ≡ 2 (mod 4). But wt(c1 + c3 ) = wt(c1 ) + wt(c3 ) − 2wt(c1 ∩ c3 ) ≡
wt(c1 ) + wt(c3 ) (mod 4) as C 1 ⊥C 3 . Since both wt(c1 ) and wt(c3 ) are odd, wt(c1 ) ≡ wt(c3 )
(mod 4). As C ′′ cannot be Type II since n + 2 ≡ 4 (mod 8), wt(c1 ) ≡ wt(c3 ) ≡ 1 ≡
n/2 (mod 4), proving (iii) in this case.
Now suppose n ≡ 6 (mod 8). Let D = {c00 | c ∈ C 0 } ∪ {c11 | c ∈ C 2 }, which is doublyeven. Clearly, if D′ = D ∪ (d + D) where d = 00 · · · 011, D′ is a Type I code. Also as the
all-one vector of length n is in C 2 , the all-one vector of length n + 2 is in D. By Exercise 526,
D is contained in only one Type I and two Type II codes of length n + 2, implying that C ′′
must be Type II. Hence if c ∈ C 1 ∪ C 3 , then wt(c) ≡ 3 ≡ n/2 (mod 4), proving (iii) in this
case also.
Exercise 517 Show that the code constructed in (9.6) is self-dual.
Exercise 518 Prove Theorem 9.4.10(i) and (ii).
Example 9.4.11 Let C be a [6, 3, 2] Type I code with C 0 its doubly-even subcode. See
Example 9.4.5. As 1 ∈ C 2 , we can apply Theorem 9.4.10 and use Table 9.5 to obtain the
weight distribution of C i :
0
C0
C2
C1
C3
1
2
3
1
4
5
6
3
3
1
4
4
Exercise 519 Verify that the weight distribution of each C i given in Example 9.4.11 is
correct. Do this in two ways. First, use Theorem 9.4.10 along with Table 9.5, and second,
use the construction presented in Example 9.4.5.
The upper bound on the minimum distance of Type I codes given in Theorem 9.3.1(i)
was first improved in [58]; this bound was later improved to the current bound given in
Theorem 9.3.5. In proving the intermediate bound, the weight enumerator of the shadow
was considered; if the weight enumerator of the shadow had either negative or fractional
coefficients, the original code could not exist. Using this idea, Conway and Sloane in [58]
have given possible weight enumerators of Type I codes of lengths through 64 with the
highest possible minimum weight. A number of researchers have found Type I codes with
a weight enumerator given as a possibility in [58].
If C is an [n, n/2, d] self-dual code with d > 2, pick two coordinate positions and consider the ((n/2) − 1)-dimensional subcode C ′ of C with either two 0s or two 1s in these
positions. (The requirement d > 2 guarantees that C ′ is ((n/2) − 1)-dimensional.) If we
∗
∗
puncture C ′ on these positions, we obtain a self-dual code C ′ of length n − 2; C ′ is
358
Self-dual codes
∗
called a child of C and C is called a parent3 of C ′ . This process reverses the construction
of (9.6).
Example 9.4.12 There is a [22, 11, 6] Type I code whose shadow has weight 7 and the
extension in (9.6) gives the [24, 12, 8] extended binary Golay code. The length 22 code is
a child of the Golay code, sometimes called the odd Golay code.
If n ≡ 6 (mod 8) and C is a Type I code with C 0 the doubly-even subcode, then 1 ∈ C 0 .
Thus when n ≡ 6 (mod 8) our proof of Theorem 9.4.10 shows that any Type I code of length
n can be extended to a Type II code of length n + 2, denoted C ′′ in the proof. Example
9.4.12 illustrates this.
If n ≡ 0 (mod 8) and C is a Type II code of length n, it is also possible to define children
of C of lengths n − 4 and n − 6 that are Type I codes. It can be shown [53, 264] that any
Type I code of length n ≡ 0 (mod 8) is a child of a Type II code of the next larger length
equivalent to 0 modulo 8. This approach was used to classify all self-dual codes of lengths
26, 28, and 30 once all Type II codes of length 32 had been classified. See [53, 56, 264].
We can determine the weight enumerator of a parent from that of a child.
Theorem 9.4.13 Let C be a Type I code of length n ≡ 6 (mod 8) with C 0 its doubly-even
subcode. Suppose that Wi (x, y) is the weight enumerator of C i . Then the weight enumerator
of the parent C ′′ in (9.6) is
WC ′′ (x, y) = y 2 W0 (x, y) + x 2 W2 (x, y) + x y[W1 (x, y) + W3 (x, y)].
Exercise 520 Prove Theorem 9.4.13.
Exercise 521 Do the following:
(a) Using Theorem 9.4.13 and the table of Example 9.4.11, find the weight distribution of
the parent of a [6, 3, 2] Type I code.
(b) Find a generator matrix for a [6, 3, 2] Type I code whose first two rows generate its
doubly-even subcode. Using the method described by (9.6), find a generator matrix for
the [8, 4, 4] parent.
With the notion of parents and children we can say more about Type I codes of
length n ≡ 22 (mod 24) meeting the bound of Theorem 9.3.5. This result is due to Rains
[290].
Theorem 9.4.14 Let C be an [n, n/2, d] Type I code with n ≡ 22 (mod 24) and d =
4 ⌊n/24⌋ + 6. Then C is the child of a Type II [n + 2, (n/2) + 1, d + 2] code.
In proving Theorem 9.4.14, we note that if C 0 is the doubly-even subcode of C, then
1 ∈ C 0 . Thus the shadow has odd weight. The difficult part of the proof is showing that the
shadow is a coset of C of weight d + 1. Once we know that, the parent of C has minimum
weight d + 2. Example 9.4.12 is an illustration of this result.
3
In Section 11.7 we use the terms “parent” and “child” in relation to cosets of a code. The current use is unrelated.
359
9.5 Counting self-orthogonal codes
9.5
Counting self-orthogonal codes
As we will see in Section 9.7, the classification of self-dual codes of a fixed length n over
F2 , F3 , or F4 depends on knowledge of the number of such codes. We begin with the binary
case, where we will count the number of self-dual codes and the number of Type II codes.
We first determine the total number of self-dual codes.
Theorem 9.5.1 ([259, 260]) The number of self-dual binary codes of length n is
n
2 −1
i=1
(2i + 1).
Proof: A self-dual binary code must contain the all-one vector 1. Let σn,k denote the number
of [n, k] self-orthogonal binary codes containing 1. We note first that σn,1 = 1. We now
find a recurrence relation for σn,k . We illustrate this recursion process by computing σn,2 .
There are 2n−1 − 2 even weight vectors that are neither 0 nor 1. Each of these vectors is
in a unique [n, 2] self-orthogonal code containing 1; each such code contains two of these
vectors. So σn,2 = 2n−2 − 1.
Every [n, k + 1] self-orthogonal code containing 1 contains an [n, k] self-orthogonal
code also containing 1. Beginning with an [n, k] self-orthogonal code C containing 1,
the only way to find an [n, k + 1] self-orthogonal code C ′ containing C is by adjoining a
vector c′ from one of the 2n−2k − 1 cosets of C in C ⊥ that is unequal to C. Thus C can be
extended to 2n−2k − 1 different [n, k + 1] self-orthogonal codes C ′ . However, every such
C ′ has 2k − 1 subcodes of dimension k containing 1 by Exercise 522(a). This shows that
σn,k+1 =
2n−2k − 1
σn,k .
2k − 1
Note that using this recurrence relation, we obtain σn,2 = 2n−2 − 1 as earlier. Using this
recurrence relation, the number of self-dual binary codes of length n is
24 − 1
26 − 1
2n−2 − 1
22 − 1
·
·
·
·
·
· σn,1
2n/2−1 − 1 2n/2−2 − 1 2n/2−3 − 1
21 − 1
2n−2 − 1
22 − 1 24 − 1 26 − 1
· 2
· 3
· · · n/2−1
= 1
2 −1 2 −1 2 −1
2
−1
σn,n/2 =
= (21 + 1)(22 + 1)(23 + 1) · · · (2n/2−1 + 1),
which is the desired result.
Exercise 522 In Exercise 431, you were asked to show that there are
t = (q r − 1)/(q − 1)
(r − 1)-dimensional subcodes of an r -dimensional code over Fq . Do the following:
(a) Prove that a binary code of dimension k + 1 containing the all-one vector 1 has 2k − 1
subcodes of dimension k also containing 1.
(b) Prove that a binary code of dimension k + 1 containing an [n, m] subcode C has
2k+1−m − 1 subcodes of dimension k also containing C.
360
Self-dual codes
Example 9.5.2 Theorem 9.5.1 states that there are (21 + 1) = 3 [4, 2] self-dual binary
codes. They have generator matrices
1
1
1
1
0
1
0
,
1
1
1
0
1
1
1
0
, and
1
1 0 0 1
.
1 1 1 1
Notice that these are all equivalent and so up to equivalence there is only one [4, 2] self-dual
binary code.
Exercise 523 Verify Theorem 9.5.1 directly for n = 6 by constructing generator matrices
for all [6, 3] self-dual binary codes.
We can also count the number of self-orthogonal binary codes of maximum dimension
when the code has odd length. The proof of this result is found in [259], where the number
of [n, k] self-orthogonal codes with k ≤ (n − 1)/2 is also given.
Theorem 9.5.3 If n is odd, the number of [n, (n − 1)/2] self-orthogonal binary codes is
n−1
2
(2i + 1).
i=1
The next theorem extends Theorem 9.5.1.
Theorem 9.5.4 If C is an [n, m] self-orthogonal binary code containing 1, then C is contained in
n
2 −m
i=1
(2i + 1)
self-dual codes.
Proof: For k ≥ m, let τn,k be the number of [n, k] self-orthogonal codes containing C. Note
that τn,m = 1. Beginning with an [n, k] self-orthogonal code C 1 containing C and adjoining
n−2k
−1
an element of a coset of C 1 in C ⊥
1 provided the element is not in C 1 , we obtain 2
possible [n, k + 1] self-orthogonal codes C 2 containing C 1 . By Exercise 522(b), there are
2k+1−m − 1 subcodes of C 2 of dimension k containing C. Thus
τn,k+1 =
2n−2k − 1
τn,k .
2k+1−m − 1
Using this recurrence relation, the number of self-dual binary codes of length n containing
C is
τn,n/2 =
=
24 − 1
26 − 1
2n−2m − 1
22 − 1
· n/2−m−1
· n/2−m−2
···
· τn,m
−1 2
−1 2
−1
21 − 1
2n/2−m
2n−2m − 1
22 − 1 24 − 1 26 − 1
·
·
·
·
·
21 − 1 22 − 1 23 − 1
2n/2−m − 1
= (21 + 1)(22 + 1)(23 + 1) · · · (2n/2−m + 1),
which is the desired result.
361
Y
L
9.5 Counting self-orthogonal codes
F
Exercise 524 Prove Theorem 9.5.1 as a corollary of Theorem 9.5.4.
T
m
a
e
Now we would like to count the number of Type II codes, which is more complicated
than counting the number of self-dual codes. We will state the results here and will leave
most proofs to Section 9.12. We have the following theorem counting the number of Type
II codes.
Theorem 9.5.5 Let n ≡ 0 (mod 8).
(i) Then there are
n
2 −2
i=0
(2i + 1)
Type II codes of length n.
(ii) Let C be an [n, k] doubly-even code containing 1. Then C is contained in
n
2 −k−1
i=0
(2i + 1)
Type II codes of length n.
Example 9.5.6 Applying Theorem 9.5.5 to n = 8, we see that there are 2(2 + 1)(22 + 1) =
30 Type II [8, 4, 4] codes. By Exercise 56, all these codes are equivalent.
Exercise 525
1 1
G = 1 1
1 1
The matrix
1
0
1
1
0
1
0
1
1
0
1
1
0
0
1
0
0
1
is the generator matrix of an [8, 3, 4] doubly-even binary code C. How many Type II codes
are there that contain C?
Exercise 526 Prove that if n ≡ 0 (mod 8), then every [n, (n/2) − 1] doubly-even code
containing 1 is contained in one Type I code and two Type II codes.
When n is a multiple of eight, the number of [n, k] doubly-even binary codes with
k ≤ n/2 has been counted; for example, see Exercise 567 for the number of such codes
containing 1.
We can also count the number of maximal doubly-even binary codes of length n in the
case when 8 ∤ n.
Theorem 9.5.7 The number of [n, (n/2) − 1] doubly-even binary codes is:
+ n2 −1 i
(i)
i=2 (2 + 1) if n ≡ 4 (mod 8), and
+ n2 −1 i
(ii) i=1 (2 + 1) if n ≡ 2 (mod 4).
The proof of Theorem 9.5.7(i) is left to Section 9.12, but the proof of (ii) is quite
easy once we observe that the count in (ii) agrees with the count in Theorem 9.5.1. As
the proof is instructive, we give it here. Let C be a self-dual code of length n where
n ≡ 2 (mod 4). Then, C is singly-even as C contains 1. There is a unique [n, (n/2) − 1]
362
Self-dual codes
doubly-even subcode C 0 by Theorem 1.4.6. Conversely, if C 0 is an [n, (n/2) − 1]
doubly-even code, it does not contain 1 as n ≡ 2 (mod 4). So the only self-dual code
containing C 0 must be obtained by adjoining 1. Thus the number of doubly-even
[n, (n/2) − 1] binary codes equals the number of self-dual codes; (ii) now follows by
Theorem 9.5.1.
Example 9.5.8 The [6, 2] doubly-even code C 0 with generator matrix
1
0
1
0
1
1
1
1
0
1
0
1
is contained in the self-dual code C
1
1 1 1 1 0 0
0 0 1 1 1 1 or 0
0
1 1 1 1 1 1
with generator matrix either
1 0 0 0 0
0 1 1 0 0.
0 0 0 1 1
Example 9.5.9 Notice that if n = 4, the product in Theorem 9.5.7(i) is empty and so
equals 1. The unique doubly-even [4, 1] code is C = {0, 1}. All three self-dual codes given
in Example 9.5.2 contain C.
Exercise 527 Do the following:
(a) Prove that if n ≡ 4 (mod 8), there are three times as many self-dual binary codes of
length n as there are doubly-even [n, (n/2) − 1] binary codes.
(b) Prove that if n ≡ 4 (mod 8) and if C is a doubly-even [n, (n/2) − 1] binary code, then
C is contained in exactly three self-dual [n, n/2] codes. Hint: Consider the four cosets
of C in the [n, (n/2) + 1] code C ⊥ .
The number of doubly-even binary codes of odd length n and dimension k < n/2 has
also been computed in [92].
There are analogous formulas for the number of Type III and Type IV codes. We state
them here.
Theorem 9.5.10 The following hold:
(i) [259, 260] The number of Type III codes over F3 of length n ≡ 0 (mod 4) is
n
2 −1
2
(3i + 1).
i=1
(ii) [217] The number of Type IV codes over F4 of length n ≡ 0 (mod 2) is
n
2 −1
(22i+1 + 1).
i=0
The number of self-orthogonal ternary codes of length n with 4 ∤ n are also known; see
[260]. We state the number of maximal ones.
363
9.5 Counting self-orthogonal codes
Theorem 9.5.11 The following hold:
(i) If n is odd, the number of [n, (n − 1)/2] self-orthogonal ternary codes is
n−1
2
(3i + 1).
i=1
(ii) If n ≡ 2 (mod 4), the number of [n, (n − 2)/2] self-orthogonal ternary codes is
n
2
(3i + 1).
i=2
Exercise 528 In a Hermitian self-dual code over F4 , all codewords have even weight and
any even weight vector in Fn4 is orthogonal to itself under the Hermitian inner product. Let
νn be the number of even weight vectors in Fn4 .
(a) Show that νn satisfies the recurrence relation νn = νn−1 + 3(4n−1 − νn−1 ) with ν1 = 1.
(b) Solve the recurrence relation in part (a) to show that
νn = 2 · 4n−1 − (−2)n−1 .
(c) Show that the number of [n, 1] Hermitian self-orthogonal codes is
2 · 4n−1 − (−2)n−1 − 1
.
3
We conclude this section with a bound, found in [219], analogous to the Gilbert and
Varshamov Bounds of Chapter 2.
Theorem 9.5.12 Let n ≡ 0 (mod 8). Let r be the largest integer such that
n
n
n
n
n
+ ··· +
< 2 2 −2 + 1.
+
+
4
8
12
4(r − 1)
(9.7)
Then there exists a Type II code of length n and minimum weight at least 4r .
Proof: First, let v be a vector in Fn2 of weight 4i with 0 < 4i < 4r . The number of Type II
+n/2−3
codes containing the subcode {0, 1, v, 1 + v} is i=0 (2i + 1) by Theorem 9.5.5(ii). Thus
there are at most
1 n/2−3
n
n
n
n
(2i + 1)
+
+
+ ··· +
4
8
12
4(r − 1)
i=0
Type II codes of minimum weight less than 4r . If this total is less than the total num+n/2−2
ber of Type II codes, which is i=0 (2i + 1) by Theorem 9.5.5(i), then there is at least
one Type II code of minimum weight at least 4r . The latter statement is equivalent to
(9.7).
In Table 9.6, we illustrate the bound of Theorem 9.5.12 and compare it to the Varshamov
Bound in Chapter 2. The length is denoted “n” in the table. The highest value of d in the
Varshamov Bound for which there is any [n, n/2, d] code is given in the table in the column
“dvar ”. (Note that the values of d obtained from the Gilbert Bound were no larger than
the values from the Varshamov Bound.) The largest d in the bound of Theorem 9.5.12 is
364
Self-dual codes
Table 9.6 Type II codes of length n ≤ 96
n
dvar
dsdII
dknown
n
dvar
dsdII
dknown
8
16
24
32
40
48
3
4
5
6
7
8
4
4
4
4
8
8
4E
4E
8E
8E
8E
12 E
56
64
72
80
88
96
9
9
10
11
12
13
8
8
12
12
12
12
12 E
12 E
12
16 E
16 E
16
denoted “dsdII ”. The largest minimum weight for a Type II code of length n known to exist
is denoted “dknown ”; an “E” on this entry indicates the code is extremal. Construction of
the latter codes, with references, for 8 ≤ n ≤ 88 is described in [291]; the [96, 48, 16] Type
II code was first constructed by Feit [80]. The Varshamov Bound states that codes meeting
the bound exist; where to search for them is not obvious. From the table, one sees that to
find good codes with dimension half the length, one can look at Type II codes when the
length is a multiple of eight and obtain codes close to meeting the Varshamov Bound. It is
instructive to compare Table 9.6 where we are examining codes whose minimum weights
are compared to a lower bound with Table 9.1 where we are dealing with codes whose
minimum weights are compared to an upper bound.
Exercise 529 Verify the entries in the columns dvar and dsdII of Table 9.6.
There is a result analogous to that of Theorem 9.5.12 for self-dual codes over Fq found
in [278]. Example 9.5.13 illustrates this for q = 2.
Example 9.5.13 Let n be even. Let r be the largest integer such that
n
n
n
n
n
+
+
+ ··· +
< 2 2 −1 + 1.
2
4
6
2(r − 1)
(9.8)
In Exercise 530, you are asked to prove that there exists a self-dual binary code of length
n and minimum weight at least 2r . Table 9.7 is analogous to Table 9.6 for self-dual binary
codes of even length n with 4 ≤ n ≤ 50 using the bound (9.8). The column “dvar ” has the
same meaning as it does in Table 9.6. The largest d from bound (9.8) is denoted “dsd ”. The
largest minimum weight for a self-dual binary code (Type I or Type II) of length n known to
exist is denoted “dknown ”. In all cases, it has been proved by various means that there is no
self-dual binary code of length n ≤ 50 and minimum weight higher than that given in the
table under “dknown ”. For length n ≤ 32 there are no codes with higher minimum weight by
the classifications discussed in Section 9.7. For 34 < n < 50, there is no self-dual code with
higher minimum weight by Theorem 9.3.5. There are no [34, 17, 8] or [50, 25, 12] selfdual codes by [58]; if either code exists, its shadow code would have a weight enumerator
with noninteger coefficients. Furthermore, if n ≡ 0 (mod 8), the codes of highest minimum
weight are all attained by Type II codes (and possibly Type I codes as well). Again it is
worth comparing this table to Table 9.1.
365
9.6 Mass formulas
Table 9.7 Self-dual binary codes of length
4 ≤ n ≤ 50
n
dvar
dsd
dknown
n
dvar
dsd
dknown
4
6
8
10
12
14
16
18
20
22
24
26
2
3
3
3
3
4
4
4
4
5
5
5
2
2
2
2
2
2
4
4
4
4
4
4
2
2
4
2
4
4
4
4
4
6
8
6
28
30
32
34
36
38
40
42
44
46
48
50
5
6
6
6
6
7
7
7
7
7
8
8
4
4
4
6
6
6
6
6
6
6
6
8
6
6
8
6
8
8
8
8
8
10
12
10
Exercise 530 Prove that there exists a self-dual binary code of length n and minimum
weight at least 2r provided r is the largest integer satisfying (9.8).
Exercise 531 Verify the entries in the columns dvar and dsd of Table 9.7.
9.6
Mass formulas
We now know, from Theorem 9.5.1, that the total number of self-dual binary codes of
+n/2−1
length n is i=1 (2i + 1). Ultimately we would like to find a representative from each
equivalence class of these codes; this is called a classification problem. It would be of great
help to know the number of inequivalent codes. At this point there does not seem to be a
direct formula for this. However, we can create a formula that can be used to tell us when
we have a representative from every class of inequivalent self-dual codes. This formula is
called a mass formula.
Let C 1 , . . . , C s be representatives from every equivalence class of self-dual binary codes
of length n. The number of codes equivalent to C j is
n!
|Symn |
=
.
|PAut(C j )|
|PAut(C j )|
Therefore, summing over j, we must obtain the total number of self-dual codes, which
yields the following mass formula for self-dual binary codes:
s
j=1
n
2 −1
n!
=
(2i + 1).
|PAut(C j )|
i=1
(9.9)
We can find mass formulas for Type II, III, and IV codes also, in the same fashion, using
the results of Theorems 9.5.5(i) and 9.5.10. For Type III, we use the group of all monomial
transformations, which has size 2n n!, in place of Symn . For Type IV, we use the group
366
Self-dual codes
of all monomial transformations followed possibly by the Frobenius map, which has size
2 · 3n n!, in place of Symn . This gives the following theorem.
Theorem 9.6.1 We have the following mass formulas:
(i) For self-dual binary codes of length n,
n
j
2 −1
n!
(2i + 1).
=
|PAut(C j )|
i=1
(ii) For Type II codes of length n,
n
j
2 −2
n!
=
(2i + 1).
|PAut(C j )|
i=0
(iii) For Type III codes of length n,
n
j
2 −1
2n n!
=2
(3i + 1).
|MAut(C j )|
i=1
(iv) For Type IV codes of length n,
n
j
2 −1
2 · 3n n!
=
(22i+1 + 1).
|ŴAut(C j )|
i=0
In each case, the summation is over all j, where {C j } is a complete set of representatives
of inequivalent codes of the given type.
Example 9.6.2 We continue with Example 9.5.6 and illustrate the mass formula for Type
II codes in Theorem 9.6.1(ii) for n = 8. Any Type II code of length 8 must be an [8, 4, 4]
code. Such a code C is unique and the left-hand side of the mass formula is 8!/|PAut(C)|; the
right-hand side is 30. Therefore, |PAut(C)| = 1344; it is known that PAut(C) is isomorphic
to the 3-dimensional general affine group over F2 which indeed has order 1344.
9.7
Classification
Once we have the mass formula for a class of codes, we can attempt to classify them.
Suppose we want to classify the self-dual codes of length n. The strategy is to classify
these codes for the smallest lengths first. Once that is accomplished, we proceed to larger
lengths using the results obtained for smaller lengths. We discuss the overall strategy for
classification focusing primarily on self-dual binary codes.
9.7.1
The Classification Algorithm
There is a general procedure that has been used in most of the classifications attempted. The Classification Algorithm, which we state for self-dual binary codes, is as
follows:
367
9.7 Classification
I. Find a self-dual binary code C 1 of length n.
II. Compute the size of the automorphism group PAut(C 1 ) of the code found in Step I and
calculate the contribution of n!/|PAut(C 1 )| to the left-hand side of (9.9).
III. Find a self-dual binary code C j not equivalent to any previously found. Compute the
size of the automorphism group PAut(C j ) and calculate n!/|PAut(C j )|; add this to
the contributions to the left-hand side of (9.9) from the inequivalent codes previously
found.
IV. Repeat Step III until the total computed on the left-hand side of (9.9) gives the amount
on the right-hand side.
The classification for Type II, III, and IV codes is analogous where we use the appropriate
mass formula from Theorem 9.6.1.
Example 9.7.1 We classify all the self-dual binary codes of length n = 8. The right-hand
side of the mass formula is (2 + 1)(22 + 1)(23 + 1) = 135. From Example 9.6.2, the unique
[8, 4, 4] code C 1 has automorphism group of size 1344. It contributes 30 to the left-hand
side of the mass formula. There is also the self-dual code C 2 = C ⊕ C ⊕ C ⊕ C, where C
is the [2, 1, 2] binary repetition code. Any of the two permutations on the two coordinates
of any direct summand is an automorphism; this gives a subgroup of PAut(C 2 ) of order 24 .
Furthermore, any of the four summands can be permuted in 4! ways. All the automorphisms
can be generated from these; the group has order 24 4! and C 2 contributes 105 to the left-hand
side of the mass formula. The total contribution is 135, which equals the right-hand side.
Hence there are only two inequivalent self-dual binary codes of length 8.
Exercise 532 Find a complete list of all inequivalent self-dual binary codes of lengths:
(a) n = 2,
(b) n = 4,
(c) n = 6, and
(d) n = 10.
As the lengths grow, the classification becomes more difficult. There are methods to
simplify this search. When many of the original classifications were done, computing the
automorphism groups was also problematic; now, with computer programs that include
packages for codes, finding the automorphism groups is simplified.
We now go into more detail regarding classification of binary codes and the specifics
of carrying out the Classification Algorithm. There are several self-orthogonal codes
that will arise frequently. They are denoted i 2 , e7 , e8 , and d2m , with m ≥ 2. The subscript on these codes indicates their length. The codes i 2 , e7 , and e8 have generator
matrices
,
i2 : 1
1
1 , e7 : 0
1
1
0
0
1
1
1
1
1
0
0
1
1
0
1
0
1 1
0
0 0
0 , e8 :
0 0
1
1 0
1
1
0
1
1
1
0
0
0
1
1
1
0
1
1
0
0
0
1
1
0
0
.
1
0
368
Self-dual codes
The [2m, m − 1, 4] code d2m
1 1 1 1 0 0
0 0 1 1 1 1
0 0 0 0 1 1
d2m :
0 0 0 0 0 0
0 0 0 0 0 0
has generator matrix
··· 0 0 0 0 0 0
· · · 0 0 0 0 0 0
· · · 0 0 0 0 0 0
.
..
.
0 0 · · · 1 1 1 1 0 0
0 0 ··· 0 0 1 1 1 1
0 0
0 0
1 1
The notation for these codes is analogous to that used for lattices; see [246].
3 , while e7 is the [7, 3, 4]
Notice that e8 is merely the [8, 4, 4] extended Hamming code H
simplex code. As we mentioned in Example 9.6.2, the automorphism group of e8 is isomorphic to the 3-dimensional general affine group over F2 , often denoted GA3 (2), of order
1344. This group is a triply-transitive permutation group on eight points. The subgroup
stabilizing one point is a group of order 168, denoted either PSL2 (7) or PSL3 (2); this is the
automorphism group of e7 . The codes i 2 and d4 have automorphism groups Sym2 and Sym4
of orders 2 and 24, respectively. When m > 2, the generator matrix for d2m has identical
columns {2i − 1, 2i} for 1 ≤ i ≤ m. Any permutation interchanging columns 2i − 1 and
2i is an automorphism of d2m ; any permutation sending all pairs {2i − 1, 2i} to pairs is also
an automorphism. These generate all automorphisms, and PAut(d2m ) has order 2m m! when
m > 2.
We need some additional terminology and notation. If the code C 1 is equivalent to C 2 ,
then we denote this by C 1 ≃ C 2 . A code C is decomposable if C ≃ C 1 ⊕ C 2 , where C 1 and
C 2 are nonzero codes. If C is not equivalent to the direct sum of two nonzero codes, it is
indecomposable. The codes i 2 , e7 , e8 , and d2m are indecomposable. The symbol mC will
denote the direct sum of m copies of the code C; its length is obviously m times the length
of C. Finally, when we say that C ′ (for example, i 2 , e7 , e8 , or d2m ) is a “subcode” of C where
the length n of C is longer than the length n ′ of C ′ , we imply that n − n ′ additional zero
coordinates are appended to C ′ to make it length n.
The following basic theorem is very useful in classifying self-dual binary codes.
Theorem 9.7.2 Let C be a self-orthogonal binary code of length n and minimum weight d.
The following hold:
(i) If d = 2, then C ≃ mi 2 ⊕ C 1 for some integer m ≥ 1, where C 1 is a self-orthogonal
code of length n − 2m with minimum weight 4 or more.
(ii) If e8 is a subcode of C, then C ≃ e8 ⊕ C 1 for some self-dual code C 1 of length n − 8.
(iii) If d = 4, then the subcode of C spanned by the codewords of C of weight 4 is a direct
sum of copies of d2m s, e7 s, and e8 s.
Proof: We leave the proofs of (i) and (ii) to Exercise 533 and sketch the proof of (iii).
Let C ′ be the self-orthogonal subcode of C spanned by the weight 4 vectors. We prove
(iii) by induction on the size of supp(C ′ ). If C ′ is decomposable, then it is the direct sum
of two self-orthogonal codes each spanned by weight 4 vectors. By induction, each of
these is a direct sum of copies of d2m s, e7 s, and e8 s. Therefore it suffices to show that any
indecomposable self-orthogonal code, which we still denote C ′ , spanned by weight 4 vectors
369
9.7 Classification
is either d2m , e7 , or e8 . In the rest of this proof, any vector denoted ci will be a weight 4 vector
in C ′ . For every pair of vectors c1 = c2 , either supp(c1 ) and supp(c2 ) are disjoint or they
intersect in two coordinates as C ′ is self-orthogonal. Also as C ′ is indecomposable, for every
vector c1 , there is a vector c2 such that their supports overlap in exactly two coordinates.
Suppose the supports of c1 and c2 overlap in two coordinates. By permuting coordinates,
we may assume that
c1 = 1111000000 · · · and
c2 = 0011110000 · · · .
(9.10)
By Exercise 534, if c3 has support neither contained in the first six coordinates nor disjoint
from the first six coordinates, by reordering coordinates we may assume that one of the
following holds:
(a) c3 = 0000111100 · · · or
(b) c3 = 1010101000 · · · .
Suppose that (b) holds. If all weight 4 vectors of C ′ are in span{c1 , c2 , c3 }, then C ′ = e7 .
Otherwise as C ′ is self-orthogonal and indecomposable, there is a vector c4 with support overlapping the first seven coordinates but not contained in them. By Exercise 535,
span{c1 , c2 , c3 , c4 } is e8 possibly with zero coordinates added. By part (ii), C ′ = e8 as C ′ is
indecomposable. Therefore we may assume (a) holds.
By this discussion we can begin with c1 as in (9.10). If C ′ has length 4, then C ′ = d4 .
Otherwise we may assume that c2 exists as in (9.10) and these two vectors span a d6 possibly
with zero coordinates added. If this is not all of C ′ , then we may assume that there is a vector
c3 as in (a). These three vectors span a d8 again possibly with zero coordinates added. If
this is not all of C ′ , we can add another vector c4 , which must have a form like (a) but not
(b) above, and hence we create a d10 with possible zero coordinates appended. Continuing
in this manner we eventually form a d2m where C ′ has length 2m with m ≥ 5. If this is not
⊥
, which is spanned by the vectors in d2m together with
all of C ′ , then C ′ is contained in d2m
a = 0101010101 · · · and b = 1100000000 · · · . Since m ≥ 5, all of the weight 4 vectors in
C ′ must actually be in d2m because adding vectors a and b to d2m does not produce any new
weight 4 vectors. Thus C ′ = d2m .
Exercise 533 Prove Theorem 9.7.2(i) and (ii).
Exercise 534 With the notation as in the proof of Theorem 9.7.2, prove that if c1 =
1111000000 · · · and c2 = 0011110000 · · · and if c3 has support neither contained in the first
six coordinates nor disjoint from the first six coordinates, by reordering coordinates we may
assume that one of the following holds:
(a) c3 = 0000111100 · · · or
(b) c3 = 1010101000 · · · .
Exercise 535 With the notation as in the proof of Theorem 9.7.2, prove that if c1 =
1111000000 · · · , c2 = 0011110000 · · · , and c3 = 1010101000 · · · and if c4 has support
overlapping the first seven coordinates but not contained in them, then span{c1 , c2 , c3 , c4 }
is e8 possibly with zero coordinates added.
If we want to classify all self-dual codes of length n, we first classify all self-dual
codes of length less than n. To begin Step I of the Classification Algorithm, choose some
370
Self-dual codes
Table 9.8 Glue elements and group orders for d2m and e7
Component
d4
d2m (m > 2)
e7
Glue element
0000
1100
1010
1001
00000000 · · · 0000
01010101 · · · 0101
11000000 · · · 0000
10010101 · · · 0101
0000000
0101010
Symbol
|G 0 |
|G 1 |
0
x
y
z
0
a
b
c
0
d
4
6
2m−1 m!
2
168
1
decomposable code (e.g. (n/2)i 2 ). For Step II, compute its automorphism group order
(e.g. 2n/2 (n/2)! if the code is (n/2)i 2 ) and its contribution to the mass formula (e.g.
n!/(2n/2 (n/2)!)). Repeat Step III by finding the remaining decomposable codes; this is
possible as the classification of self-dual codes of smaller lengths has been completed. The
task now becomes more difficult as we must find the indecomposable codes. The methods
used at times are rather ad hoc; however, the process known as “gluing” can be very useful.
9.7.2
Gluing theory
Gluing is a way to construct self-dual codes C of length n from shorter self-orthogonal
codes systematically. We build C by beginning with a direct sum C 1 ⊕ C 2 ⊕ · · · ⊕ C t of
self-orthogonal codes, which forms a subcode of C. If C is decomposable, then C is this
direct sum. If not, then we add additional vectors, called glue vectors, so that we eventually
obtain a generator matrix for C from the generator matrix for C 1 ⊕ C 2 ⊕ · · · ⊕ C t and the glue
vectors. The codes C i are called the components of C. Component codes are chosen so that
any automorphism of C will either send a component code onto itself or possibly permute
some components among themselves. This technique has several additional advantages;
namely, it can be used to compute the order of the automorphism group of the code, which
is crucial in the Classification Algorithm, and it also allows us to describe these codes
efficiently. Gluing has been most effective in constructing self-dual codes of minimum
weight 4. In this situation, the direct sum of component codes will contain all the vectors
of weight 4 in the resulting code; if this does not produce the entire code, we add glue
vectors taking care not to introduce any new weight 4 vectors. These component codes
are determined in Theorem 9.7.2 to be d2m , e7 , and e8 . However, if e8 is a component,
Theorem 9.7.2(ii) shows that C is decomposable; in this case we do not need to use glue.
The glue vectors for d2m and e7 consist of shorter length glue elements that are juxtaposed
to form the glue vectors. The glue elements are simply elements of cosets of C i in C i⊥ . Since
we can form any glue element from a coset leader and an element in the component, we
can choose our possible glue elements from the coset leaders. In Table 9.8, we give the glue
371
9.7 Classification
elements for the codes d2m and e7 ; we use the basis for the codes as given in Section 9.7.1.
⊥
The code d2m is a [2m, m − 1] code and so d2m
is a [2m, m + 1] code; thus there are four
⊥
cosets of d2m in d2m and hence four glue elements for d2m . Similarly, there are two glue
elements for e7 . Recall that coset leaders have minimum weight in their cosets.
Exercise 536 Verify that the glue elements listed in Table 9.8 for d4 are indeed coset
leaders of different cosets of d4 in d4⊥ . Do the same for each of the other codes in the
table.
Example 9.7.3 We construct a [14, 7, 4] self-dual code using glue. We choose components
2e7 ; this subcode is 6-dimensional and has minimum weight 4. We need one glue vector.
Since the glue vector must have weight at least 6, we choose dd from Table 9.8 to be the
glue vector. Notice that as d + e7 is a coset of e7 of weight 3, dd + 2e7 is a coset of 2e7 of
weight 6. If C is the resulting code, C is indeed a [14, 7, 4] code. As e7 is self-orthogonal, so
is 2e7 . As d is in e7⊥ and dd is orthogonal to itself, the coset dd + 2e7 is self-orthogonal and
orthogonal to 2e7 ; therefore C is indeed self-dual. This is the only indecomposable self-dual
[14, 7, 4] code. We label this code 2e7 to indicate the components. As we see, this is enough
to determine the glue. From this label, a generator matrix is easy to construct.
In Example 9.7.3, we see that the components determine the glue. That is the typical
situation. In the literature, the label arising from the components is usually used to identify
the code. There are situations where the label does not uniquely determine the code (up to
equivalence); in that case, glue is attached to the label. Glue is also attached to the label if
the glue is particularly difficult to determine.
Example 9.7.4 We construct two [16, 8, 4] self-dual codes. The first code C has components 2e8 ; since this is already self-dual, we do not need to add glue. We would simply
label this code as 2e8 ; this code is Type II. The second code has only one component, d16 ,
which is a [16, 7, 4] self-orthogonal code; we need to add one glue vector. Since we want
our resulting code to have minimum weight 4, we can use either a or c but not b from
Table 9.8. But the code using c as glue is equivalent to the code using a as glue, simply by
interchanging the first two coordinates, which is an automorphism of d16 . Letting the glue
⊥
and which is orthogonal to itself, the resulting code, which we label
be a, which is in d16
d16 , is a [16, 8, 4] Type I code.
Exercise 537 Construct generator matrices for the self-dual binary codes with labels:
(a) 2e7 of length 14,
(b) 2e8 of length 16, and
(c) d16 of length 16.
We now describe how to determine the order of the automorphism group of a code
obtained by gluing. To simplify notation, let G(C) be the automorphism group ŴAut(C) =
PAut(C) of C. Suppose that C is a self-dual code obtained by gluing. Then any element of G(C)
permutes the component codes. Therefore G(C) has a normal subgroup G ′ (C) consisting of
those automorphisms that fix all components. Thus the quotient group G 2 (C) = G(C)/G ′ (C)
acts as a permutation group on the components. Hence |G(C)|, which is what we wish
372
Self-dual codes
to calculate, is |G 2 (C)||G ′ (C)|. Let G 0 (C) be the normal subgroup of G ′ (C) that fixes
each glue element modulo its component. Then G 1 (C) = G ′ (C)/G 0 (C) permutes the glue.
Therefore
|G(C)| = |G 0 (C)||G 1 (C)||G 2 (C)|.
If the components of C are C i , then |G 0 (C)| is the product of all |G 0 (C i )|. Since e7 is the
even-like subcode of the [7, 4, 3] quadratic residue code Q and d + e7 is all the odd weight
vectors in Q, all automorphisms of Q fix both e7 and d + e7 . Since the automorphism group
of Q is isomorphic to PSL2 (7), as described in Section 9.7.1, |G 0 (e7 )| = 168. Consider
d2m with m > 2. In Section 9.7.1, we indicated that |G(d2m )| = 2m m! Half of the automorphisms in G(d2m ) interchange the cosets a + d2m and c + d2m , and all automorphisms
fix b + d2m . Thus |G 0 (d2m )| = 2m−1 m! and |G 1 (d2m )| = 2. The situation is a bit different
if m = 2. There are exactly four permutations (the identity, (1, 2)(3, 4), (1, 3)(2, 4), and
(1, 4)(2, 3)) that fix d4 , x + d4 , y + d4 , and z + d4 ; hence |G 0 (d4 )| = 4. As G(d4 ) = Sym4 ,
|G 1 (d4 )| = 6.
If C has component codes of minimum weight 4, usually |G 1 (C)| = 1, but not always as
the next example illustrates. The group G 1 (C) projects onto each component C i as a subgroup
of G 1 (C i ). In technical terms, G 1 (C) is a subdirect product of the product of G 1 (C i ); that is,
pieces of G 1 (C 1 ) are attached to pieces of G 1 (C 2 ), which are attached to pieces of G 1 (C 3 ), etc.
In particular, |G 1 (C)| divides the product of |G 1 (C i )|. The values of |G i (d2m )| and |G i (e7 )|
for i = 0 and 1 are summarized in Table 9.8.
Example 9.7.5 We construct a [24, 12, 4] Type II code C labeled 6d4 . This code requires
six glue vectors. A generator matrix is
d4 0 0 0 0 0
0 d
0 0 0 0
4
0 0 d
0 0 0
4
0 0 0 d4 0 0
0 0 0 0 d4 0
0 0 0 0 0 d4
,
G=
x 0 y x y 0
x 0 0 y x y
x y 0 0 y x
x x y 0 0 y
x y x y 0 0
y 0 z y z 0
where d4 represents the basis vector 1 of d4 , and x, y, and z are from Table 9.8. The first six
rows generate the component code 6d4 . Note that all weight 4 codewords are precisely the
six codewords in the first six rows of G. As |G 0 (d4 )| = 4, |G 0 (C)| = 46 . By Exercise 538,
|G 1 (C)| = 3. Since any permutation in Sym6 permutes the six components, |G 2 (C)| = 720.
Thus
|G(C)| = |G 0 (C)||G 1 (C)||G 2 (C)| = 46 · 3 · 720.
373
9.7 Classification
Table 9.9 Labels of indecomposable self-dual codes
Length
Components
8
12
14
16
18
20
e8
d12
2e7
d16 , 2d8
d10 ⊕ e7 ⊕ f 1 , 3d6
d20 , d12 ⊕ d8 , 2d8 ⊕ d4 , 2e7 ⊕ d6 , 3d6 ⊕ f 2 , 5d4
Exercise 538 In the notation of Example 9.7.5, do the following:
(a) Verify that G 1 (C) contains the permutation (1, 2, 3) acting on each component (if we
label the coordinates of each d4 with {1, 2, 3, 4}).
(b) Show that |G 1 (C)| = 3.
Example 9.7.6 It is easier to construct a [24, 12, 4] Type II code C labeled d10 ⊕ 2e7 as we
only need to add two glue vectors since d10 ⊕ 2e7 has dimension 10. We can do this using
the generator matrix
d10 O O
O e O
7
G = O O e7 ,
a
d 0
c
0 d
where d10 and e7 represent a basis for these codes, O is the appropriate size zero matrix,
and a, c, and d are found in Table 9.8. It is easy to verify that the two glue vectors have
weight 8, are orthogonal to each other, and have sum also of weight 8. The order of G(C) is
|G(C)| = |G 0 (C)||G 1 (C)||G 2 (C)| = [(24 5!)1682 ] · 1 · 2 = 214 · 33 · 5 · 72 .
Exercise 539 Find the size of the automorphism groups of the following binary self-dual
codes:
(a) 2e7 of length 14,
(b) 2e8 of length 16, and
(c) d16 of length 16.
Not every conceivable component code will lead to a self-dual code. For example, there
are no [24, 12, 4] Type II codes with the component codes d4 or 2d4 , although there are
Type II codes of length 32 with these component codes. Note also that it is possible to have
inequivalent codes with the same label; in such cases it is important to specify the glue.
It may happen that when constructing the component codes some positions have no
codewords; then those positions are regarded as containing the free code, denoted f n , whose
only codeword is the zero vector of length n. Then |G 0 ( f n )| = 1. In Table 9.9 we give the
labels for the indecomposable self-dual codes of length n ≤ 20; note that the self-dual codes
of length 10 are all decomposable.
374
Self-dual codes
Table 9.10 Number of inequivalent self-dual binary codes
Length
Number self-dual
Length
Number self-dual
2
4
6
8
10
12
14
16
7
1
1
1
2
2
3
4
18
20
22
24
26
28
30
9
16
25
55
103
261
731
Length
8
16
24
32
Number Type II
1
2
9
85
Exercise 540 Find the glue vectors and order of the automorphism group of the self-dual
code of length 18 with label d10 ⊕ e7 ⊕ f 1 .
Example 9.7.7 There are seven indecomposable Type II codes of length 24. Their labels
are d24 , 2d12 , d10 ⊕ 2e7 , 3d8 , 4d6 , 6d4 , and g24 , where g24 is the [24, 12, 8] extended binary
Golay code.
Exercise 541 Find the glue vectors and order of the automorphism group of the [24, 12, 4]
self-dual code with labels:
(a) 2d12 , and
(b) 3d8 .
The classification of self-dual binary codes began with the classification for lengths
2 ≤ n ≤ 24 in [262, 281]. Then the Type II codes of length 32 were classified in [53], and,
finally, the self-dual codes of lengths 26, 28, and 30 were found in [53, 56, 264]. Table 9.10
gives the number of inequivalent self-dual codes of lengths up through 30 and the number
of inequivalent Type II codes up through length 32.
Exercise 542 In Table 9.9 we give the labels for the components of the two indecomposable
self-dual codes of length 16. By Table 9.10 there are seven self-dual codes of length 16.
Give the labels of the decomposable self-dual codes of length 16.
Recall by Theorem 9.7.2 that if a self-dual code C of length n has minimum weight 2, it
is decomposable and equivalent to mi 2 ⊕ C 1 , where C 1 is a self-dual code of length n − 2m
and minimum weight at least 4. So in classifying codes of length n, assuming the codes of
length less than n have been classified, we can easily find those of minimum weight 2. Also
by Theorem 9.7.2, if e8 is a subcode of C, then C is decomposable and can be found. In the
classification of the indecomposable self-dual codes of lengths through 24 with minimum
weight at least 4, the components for all codes, except the [24, 12, 8] extended binary Golay
code, were chosen from d2m and e7 . However, in classifying the Type II codes of length
32, it was also necessary to have components with minimum weight 8. There are several of
these and they are all related to the binary Golay codes, in one way or another [53, 56, 291].
One of these, denoted g16 , is isomorphic to the [16, 5, 8] first order Reed–Muller code. Let
g24 be the [24, 12, 8] extended binary Golay code. If e is any weight 8 codeword in g24 with
support P 1 , by the Balance Principle, the subcode with support under the complementary
375
9.7 Classification
Table 9.11 Number of inequivalent Type III and IV codes
Length
4
8
12
16
20
Number Type III
1
1
3
7
24
Length
2
4
6
8
10
12
14
16
Number Type IV
1
1
2
3
5
10
21
55
positions to P 1 must be a [16, 5, 8] code, which is unique and hence equivalent to g16 .
Reordering coordinates so that P 1 is the first eight and writing a generator matrix for g24
in the form (9.3), the last six rows can be considered as glue for e glued to g16 . Note that
we already used parts of this construction in obtaining the Nordstrom–Robinson code in
Section 2.3.4.
In classifying the Type II codes of length 32, five inequivalent extremal [32, 16, 8] codes
were found. One of these is the extended quadratic residue code and another is the second
order Reed–Muller code. As all five are extremal, they have the same weight distribution;
their generator matrices can be found in [53, 56, 291]. The situation was too complicated
to check the progress of the classification using only the mass formula of Theorem 9.5.5(i),
as had been done for lower lengths. In this case, subformulas were developed that counted
the number of codes containing a specific code, such as d8 . In the end, the original mass
formula was used to check that the classification was complete. The Type I codes of lengths
26, 28, and 30 could then be found once it was observed that they were all children of
Type II codes of length 32 [53, 56, 264]. Understanding the structure of a length 32 Type II
code helps determine the equivalence or inequivalence of its numerous children.
At length 40, there are N = (1 + 1) · (2 + 1) · (22 + 1) · · · (218 + 1) Type II codes by
Theorem 9.5.5(i). As N /40! ≥ 17 000, it does not make sense to attempt to classify the
Type II codes of length 40 or larger; see Exercise 543.
Exercise 543 Explain, in more detail, why there are at least 17 000 inequivalent Type II
codes of length 40.
As the extremal codes are most interesting, it makes sense only to classify those. However,
formulas for the number of these are not known. On occasion, different methods have been
used to classify extremal codes successfully. The number of inequivalent extremal binary
codes of any length greater than 32, is, as yet, unknown.
Ternary self-dual codes have been classified up to length n = 20. In the ternary case, it
is known that any self-orthogonal code generated by vectors of weight 3 is a direct sum
of the [3, 1, 3] code e3 , with generator matrix [1 1 1], and the [4, 2, 3] tetracode, e4 . In
these classifications, components of minimum weight 6 are also needed; many of these are
derived from the [12, 6, 6] extended ternary Golay code. Table 9.11 gives the number of
inequivalent Type III [55, 222, 260, 282] and Type IV codes [55]. The extremal [24, 12, 9]
Type III codes have been classified in [191] using the classification of the 24 × 24 Hadamard
matrices. The [24, 12, d] Type III codes with d = 3 or 6 have not been classified and are of
little interest since the extremal ones are known.
376
Self-dual codes
While self-dual codes have a number of appealing properties, there is no known efficient
decoding algorithm for such codes. In Chapter 10 we give two algorithms to decode the
[24, 12, 8] extended binary Golay code; these algorithms can be generalized, but only in a
rather restricted manner.
Research Problem 9.7.8 Find an efficient decoding algorithm that can be used on all
self-dual codes or on a large family of such codes.
9.8
Circulant constructions
Many of the codes we have studied or will study have a particular construction using
circulant matrices. An m × m matrix A is circulant provided
a1
am
A = am−1
a2
a2
a1
am
a3
a2
a1
a3
a4
· · · am
· · · am−1
· · · am−2
.
..
.
···
a1
An (m + 1) × (m + 1) matrix B is bordered circulant if
α
γ
B = γ
.
..
β
···
β
A
γ
,
(9.11)
where A is circulant. Notice that the identity matrix In is both a circulant and a bordered
circulant matrix.
We say that a code has a double circulant generator matrix or a bordered double
circulant generator matrix provided it has a generator matrix of the form
[Im
A]
or
[ Im+1
B ],
(9.12)
where A is an m × m circulant matrix and B is an (m + 1) × (m + 1) bordered circulant matrix, respectively. A code has a double circulant construction or bordered double
circulant construction provided it has a double circulant or bordered double circulant
generator matrix, respectively.
Example 9.8.1 Using the generator matrix for the [12, 6, 6] extended ternary Golay
code given in Section 1.9.2 we see that this code has a bordered double circulant
construction.
377
9.8 Circulant constructions
In a related concept, an m × m matrix R is called reverse circulant provided
r1
r2
R = r3
rm
r2
r3
r4
r3
r4
r5
···
···
···
..
.
r1
r2
· · · rm−1
rm
r1
r2
.
An (m + 1) × (m + 1) matrix B is bordered reverse circulant if it has form (9.11) where A is
reverse circulant. Again a code has a reverse circulant generator matrix or a bordered reverse
circulant generator matrix provided it has a generator matrix of the form (9.12) where A is
an m × m reverse circulant matrix or B is an (m + 1) × (m + 1) bordered reverse circulant
matrix, respectively. Note that we drop the term “double” because the identity matrix is not
reverse circulant. A code has a reverse circulant construction or bordered reverse circulant
construction provided it has a reverse circulant or bordered reverse circulant generator
matrix, respectively.
Example 9.8.2 The construction of the [24, 12, 8] extended binary Golay code in
Section 1.9.1 is a bordered reverse circulant construction.
Exercise 544 Prove that if R is a reverse circulant matrix, then R = R T .
Example 9.8.3 Let A be a 5 × 5 circulant matrix with rows a1 , . . . , a5 . Form a matrix R
with rows a1 , a5 , a4 , a3 , a2 . Now do the same thing with the double circulant matrix [I5 A].
So we obtain
1 0 0 0 0 a1 a2 a3 a4 a5
0 1 0 0 0 a a a a a
5
1
2
3
4
[I5 A] = 0 0 1 0 0 a4 a5 a1 a2 a3
0 0 0 1 0 a3 a4 a5 a1 a2
0 0 0 0 1 a2 a3 a4 a5 a1
1 0 0 0 0 a1 a2 a3 a4 a5
0 0 0 0 1 a a a a a
2
3
4
5
1
→ 0 0 0 1 0 a3 a4 a5 a1 a2 = [P5 R],
0 0 1 0 0 a4 a5 a1 a2 a3
0 1 0 0 0 a5 a1 a2 a3 a4
where P5 is a matrix obtained from I5 by permuting its columns. Notice that R is reverse
circulant and that by permuting the columns of P5 we can obtain I5 . So the code generated
by the double circulant generator matrix [I5 A] equals the code generated by the matrix
[P5 R], which is equivalent to the code generated by the reverse circulant generator matrix
[I5 R]. You are asked to generalize this in Exercise 545.
378
Self-dual codes
Exercise 545 Prove the following:
(a) A code has a double circulant construction if and only if it is equivalent to a code with
a reverse circulant construction.
(b) A code has a bordered double circulant construction if and only if it is equivalent to a
code with a bordered reverse circulant construction.
A subfamily of formally self-dual codes is the class of isodual codes. A code is isodual
if it is equivalent to its dual. Clearly, any isodual code is formally self-dual; however, a
formally self-dual code need not be isodual as seen in [84]. Exercise 546 shows how to
construct isodual codes and hence formally self-dual codes.
Exercise 546 Prove the following:
(a) A code of even length n with generator matrix [In/2 A], where A = AT is isodual.
(b) A code with a double circulant or reverse circulant construction is isodual. Hint: See
Exercises 544 and 545.
(c) A code with a bordered double circulant or bordered reverse circulant construction is
isodual provided the bordered matrix (9.11) used in the construction satisfies either
β = γ = 0 or both β and γ are nonzero.
9.9
Formally self-dual codes
In this section we examine general properties of formally self-dual codes and describe the
current state of their classification. Recall that a code is formally self-dual if the code and
its dual have the same weight enumerator. When considering codes over fields F2 , F3 ,
and F4 , the only formally self-dual codes that are not already self-dual and whose weight
enumerators are combinations of Gleason polynomials, are the even formally self-dual
binary codes. Therefore in this section we will only consider even formally self-dual binary
codes; relatively little work has been done on formally self-dual codes over nonbinary
fields.
We begin by summarizing results about self-dual binary codes that either apply or do not
apply to formally self-dual binary codes.
r While self-dual codes contain only even weight codewords, formally self-dual codes
may contain odd weight codewords as well. See Exercise 547.
r A formally self-dual binary code C is even if and only if 1 ∈ C; see Exercise 548.
r If C is an even formally self-dual binary code of length n with weight distribution A (C),
i
then Ai (C) = An−i (C); again see Exercise 548.
r Gleason’s Theorem applies to even formally self-dual binary codes. If C is such a code
of length n, then
WC (x, y) =
⌊ n8 ⌋
n
ai g1 (x, y) 2 −4i g2 (x, y)i ,
i=0
where g1 (x, y) = y 2 + x 2 and g2 (x, y) = y 8 + 14x 4 y 4 + x 8 .
379
9.9 Formally self-dual codes
r In Theorem 9.3.4, a bound on the minimum weight of an even formally self-dual bi-
nary code is presented. If C is an [n, n/2, d] even formally self-dual binary code,
then
$n %
+ 2 if n ≤ 30,
2
8
(9.13)
d≤
$n %
2
if n ≥ 32.
8
Furthermore, an even formally self-dual binary code meeting this bound with n ≤ 30
has a unique weight enumerator. For n ≤ 30, by Theorem 9.3.4, the bound is met precisely when n = 16 and n = 26. It is natural to ask if this bound can be improved. For
example, if n = 72, the bound is d ≤ 18; however, by [32] there is no [72, 36, 18] code
of any type. The bound in Theorem 9.3.5 is certainly not going to hold without some
modification on the values of n for which it might hold. For example, the bound in Theorem 9.3.5 when n = 42 is d ≤ 8; however, there is a [42, 21, 10] even formally self-dual
code.
Research Problem 9.9.1 Improve the bound (9.13) for even formally self-dual binary
codes.
Exercise 547 Let C be the binary code with generator matrix
1 0 0 0 1 1
0 1 0 1 0 1.
0 0 1 1 1 0
Show that C is formally self-dual with odd weight codewords and give its weight enumerator.
Exercise 548 Let C be a formally self-dual binary code of length n with weight distribution
Ai (C).
(a) Prove that C is even if and only if 1 ∈ C.
(b) Prove that if C is even, then Ai (C) = An−i (C).
The Balance Principle can be extended to formally self-dual codes. In fact, it extends to
any code of dimension half its length. We state it in this general form but will apply it to
even formally self-dual binary codes.
Theorem 9.9.2 ([83]) Let C be an [n, n/2] code over Fq . Choose a set of coordinate positions P n 1 of size n 1 and let P n 2 be the complementary set of coordinate positions of size
n 2 = n − n 1 . Let C i be the subcode of C all of whose codewords have support in P ni , and
let Z i be the subcode of C ⊥ all of whose codewords have support in P ni . The following
hold:
(i) (Balance Principle)
dim C 1 −
n1
n2
= dim Z 2 −
2
2
and
dim Z 1 −
n1
n2
= dim C 2 − .
2
2
380
Self-dual codes
(ii) If we reorder coordinates so that P n 1 is the left-most n 1 coordinates and P n 2 is the
right-most n 2 coordinates, then C and C ⊥ have generator matrices of the form
A O
F O
gen(C) = O B and gen(C ⊥ ) = O J ,
D E
L M
where [A O] is a generator matrix of C 1 , [O B] is a generator matrix of C 2 , [F O] is
a generator matrix of Z 1 , [O J ] is a generator matrix of Z 2 , and O is the appropriate
size zero matrix. We also have the following:
(a) rank(D) = rank(E) = rank(L) = rank(M).
(b) Let A be the code of length n 1 generated by A, A D the code of length n 1 generated
by the rows of A and D, B the code of length n 2 generated by B, and B E the
code of length n 2 generated by the rows of B and E. Define F, F L , J , and J M
analogously. Then A⊥ = F L , B ⊥ = J M , F ⊥ = A D , and J ⊥ = B E .
The proof is left as an exercise.
Exercise 549 Prove Theorem 9.9.2.
Example 9.9.3 In Example 9.4.2 we showed that there is no [10, 5, 4] self-dual binary
code. In this example, we show that, up to equivalence, there is a unique [10, 5, 4] even
formally self-dual binary code C, a code we first encountered in Exercise 501. We use
the notation of Theorem 9.9.2. Order the coordinates so that a codeword in C of weight 4
has its support in the first four coordinate positions and let P n 1 be these four coordinates.
Thus the code C 1 has dimension 1 and A = [1 1 1 1]. As C is even, 1 ∈ C, and so the
all-one vector of length 6 is in B. As there is no [6, 2, 4] binary code containing the all-one
vector, dim C 2 = dim B = 1. Therefore B = [1 1 1 1 1 1]. By the Balance Principle,
dim Z 1 + 1 = dim C 2 = 1 implying that dim Z 1 = 0 and hence that F is the zero code.
Since F ⊥ = A D , we may assume that
1 0 0 0
D = 0 1 0 0 .
0 0 1 0
By Exercise 550, up to equivalence, we may choose the first two rows of E to be 111000
and 110100, respectively. Then, by the same exercise, again up to equivalence, we may
choose the third row of E to be one of 110010, 101100, or 101010; but only the last leads
to a [10, 5, 4] code. Thus we have the generator matrix
1 1 1 1 0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 1
(9.14)
gen(C) = 1 0 0 0 1 1 1 0 0 0 .
0 1 0 0 1 1 0 1 0 0
0 0 1 0 1 0 1 0 1 0
By Exercise 551, the code generated by this matrix is formally self-dual.
Exercise 550 In the notation of Example 9.9.3, show that, once A, B, and D are as
determined in that example, up to equivalence we may choose the first two rows of E to
381
9.9 Formally self-dual codes
Table 9.12 Even formally self-dual binary codes
n
d
#
s.d.
2
4
6
8
10
12
14
16
18
20
22
24
2
2
2
4
4
4
4
4∗
6
6
6
8
1
1
1
1
0
1
1
2
0
0
1
1
# e.f.s.d.
(not s.d.)
n
d
0
0
1
0
1
2
9
≥6
1
7
≥1000
0
26
28
30
32
34
36
38
40
42
44
46
48
6∗
8
8
8
8
8
8
10
10
10
10
12
#
s.d.
# e.f.s.d.
(not s.d.)
1
0
0
8
0
≥4
≥7
0
0
0
≥1
≥1
≥30
1
≥6
≥10
≥7
≥23
≥24
?
≥6
≥12
≥41
?
be 111000 and 110100. Then show that the third row is, up to equivalence, either 110010,
101100, or 101010. Furthermore, show that the first two choices for the third row both lead
to codewords in C of weight 2.
Exercise 551 Show that the binary code generated by the matrix in (9.14) is formally
self-dual.
Exercise 552 Show that up to equivalence there are two [6, 3, 2] even formally self-dual
codes and give generator matrices for these codes.
The even formally self-dual binary codes of highest possible minimum weight have been
classified completely through length 28 except for lengths 16, 22, and 26. We give the
number of inequivalent codes in Table 9.12, taken from [82]. In this table, the length is
denoted “n”. The column “d” denotes the highest minimum distance possible as given in
(9.13) except when n = 16 and n = 26 where there are no codes of any type meeting the
bound by [32]; in these two cases, the value of d is reduced by 2 in the table (and marked
with “∗”). The column “# s.d.” gives the number of inequivalent [n, n/2, d] self-dual codes;
the column “# e.f.s.d. (not s.d.)” gives the number of inequivalent [n, n/2, d] even formally
self-dual codes that are not self-dual. As we saw in Examples 9.4.2 and 9.9.3, there is exactly
one [10, 5, 4] even formally self-dual code and that code is not self-dual; the original proof
of this, different from that presented in these examples, is found in [165]. The lengths 12,
14, and 20 cases were presented in [12], [83], and [84], respectively. Also in [84] more than
1000 inequivalent codes of length 22 were shown to exist. Binary [18, 9, 6] and [28, 14, 8]
codes (not necessarily formally self-dual) were shown to be unique up to equivalence in
[309] and [156], respectively; in each case these codes in fact turned out to be even formally
self-dual. There are no self-dual codes of length 34 and minimum distance 8 (which would
meet the bound of Theorem 9.3.5) by [58]. The complete classification of self-dual codes
of lengths 18 and 20 [262] shows that there are no such codes of minimum distance 6.
382
Self-dual codes
Similarly, the complete classification of self-dual codes of lengths 28 and 30 [53, 56, 264]
shows that there are no such codes of minimum distance 8. At lengths 40, 42, and 44, the
bound on the minimum distance of self-dual codes is 8, by Theorem 9.3.5. A number of the
even formally self-dual codes counted in Table 9.12 have double circulant constructions.
All those of lengths through 12 have a double circulant construction. It can be shown that
the [14, 7, 4] self-dual code does not have a double circulant construction but exactly two
of the nine even formally self-dual codes of length 14 that are not self-dual do have such
constructions. It can also be shown that the length 18 code does also, as do two of the
seven even formally self-dual codes of length 20. All the codes of length through 20, except
possibly those of length 16, are isodual. For the minimum weights given in the table, length
22 is the smallest length for which a non-isodual even formally self-dual code is known to
exist.
In Table 9.12, we see that there exist even formally self-dual codes with higher minimum
weight than self-dual codes at lengths n = 10, 18, 20, 28, 30, 34, 42, and 44. Relatively
little is known about even formally self-dual codes at lengths greater than 48.
Research Problem 9.9.4 The bound in (9.13) on the minimum distance for even formally
self-dual binary codes of length n = 40 is d ≤ 10, while the bound on self-dual codes of
that length is d ≤ 8 by Theorem 9.3.5. No [40, 20, 10] even formally self-dual binary code,
which necessarily must be non-self-dual, is known. Find one or show it does not exist.
By Theorem 9.3.5, if C is an [n, n/2, d] self-dual binary code with n ≡ 0 (mod 24), then
d ≤ 4 ⌊n/24⌋ + 4, and when this bound is met, C is Type II. When n ≡ 0 (mod 24), there
are no known even formally self-dual codes with d = 4 ⌊n/24⌋ + 4 that are not self-dual.
Note that this value of d may not be the highest value possible; by [32] it is the highest for
n = 24, 48, and 72.
Research Problem 9.9.5 Find a [24m, 12m, 4m + 4] even formally self-dual binary code
that is not self-dual or prove that none exists.
When codes are extremal Type II, the codewords support designs by Theorem 9.3.10.
The following theorem, found in [165], illustrates the presence of designs in certain even
formally self-dual codes.
Theorem 9.9.6 Let C be an [n, n/2, d] even formally self-dual binary code with d =
2 ⌊n/8⌋ + 2 and n > 2. Let Sw be the set of vectors of weight w in C together with those of
weight w in C ⊥ ; a vector in C ∩ C ⊥ is included twice in Sw . Then the vectors in Sw hold a
3-design if n ≡ 2 (mod 8) and a 1-design if n ≡ 6 (mod 8).
We note that this result holds only for the codes of lengths 6, 10, 14, 18, 22, and 30 from
(9.13) and Table 9.12.
Exercise 553 Let C be the [10, 5, 4] even formally self-dual code C from Example 9.9.3.
(a) Show that C ∩ C ⊥ = {0, 1}.
(b) By part (a) and Theorem 9.9.6, the codewords of weight 4 in C ∪ C ⊥ hold a 3-(10, 4, λ)
design. Find λ.
(c) Show that the weight 4 codewords of C do not support a 3-design.
383
9.10 Additive codes over F4
9.10
Additive codes over F4
We now turn our attention to a class of codes of interest because of their connection to
quantum error-correction and quantum computing [44]. These codes over F4 may not be
linear over F4 but are linear over F2 . With an appropriate definition of inner product, the
codes used in quantum error-correction are self-orthogonal. Many of the ideas already
presented in this chapter carry over to this family of codes.
An additive code C over F4 of length n is a subset of Fn4 closed under vector addition. So C
is an additive subgroup of Fn4 . Because x + x = 0 for x ∈ C, C is a binary vector space. Thus
C has 2k codewords for some k with 0 ≤ k ≤ 2n; C will be referred to as an (n, 2k ) code or
an (n, 2k , d) code if the minimum weight d is known. Note that an additive code over F4
differs from a linear code over F4 in that it may not be closed under scalar multiplication
and hence may be nonlinear over F4 . An (n, 2k ) additive code C has a binary basis of k
codewords. A generator matrix for C is any k × n matrix whose rows form a binary basis
for C.
Example 9.10.1 As a linear code over F4 , the [6, 3, 4] hexacode has generator matrix
1 0 0 1 ω ω
0 1 0 ω 1 ω.
0 0 1 ω ω 1
However, thinking of the hexacode as an additive (6, 26 , 4) code, it has generator matrix
1 0 0 1 ω ω
ω 0 0 ω ω ω
0 1 0 ω 1 ω
.
0 ω 0 ω ω ω
0 0 1 ω ω 1
0 0 ω ω ω ω
The second, fourth, and sixth rows of the latter matrix are obtained by multiplying the three
rows of the former matrix by ω.
In the connection between quantum codes and additive codes over F4 [44], a natural
inner product, called the trace inner product, arises for the additive codes. Recall from
Section 3.8 that the trace Tr2 : F4 → F2 is defined by Tr2 (α) = α + α 2 . Alternately, since
conjugation in F4 is given by α = α 2 , Tr2 (α) = α + α. Thus Tr2 (0) = Tr2 (1) = 0 and
Tr2 (ω) = Tr2 (ω) = 1. By Lemma 3.8.5, Tr2 (α + β) = Tr2 (α) + Tr2 (β) indicating that Tr2
is F2 -linear. The trace inner product ·, ·T on Fn4 is defined as follows. For x = x1 x2 · · · xn
and y = y1 y2 · · · yn in Fn4 , let
n
x, yT = Tr2 (x, y) =
i=1
(xi yi + xi yi ),
(9.15)
where ·, · is the Hermitian inner product. The right-most equality in (9.15) follows by the
F2 -linearity of Tr2 and the fact that α = α. In working with the trace inner product it is helpful
384
Self-dual codes
to observe that Tr2 (αβ) = 0 if α = β or α = 0 or β = 0 and Tr2 (αβ) = 1 otherwise. The
proof that the following basic properties of the trace inner product hold is left as an exercise.
Lemma 9.10.2 If x, y, and z are in Fn4 , then:
(a) x, xT = 0,
(b) x, yT = y, xT ,
(c) x + y, zT = x, zT + y, zT , and
(d) x, ωxT = 0 if wt(x) is even while x, ωxT = 1 if wt(x) is odd.
Exercise 554 Compute 1ω1ωωω1, 1ωωωω1ωT .
Exercise 555 Prove Lemma 9.10.2.
⊥T
As usual, define the trace dual C of an additive code C to be
C ⊥T = v ∈ Fn4 u, vT = 0 for all u ∈ C .
Exercise 556 shows that if C is an additive (n, 2k ) code, then C ⊥T is an additive (n, 22n−k )
code. The additive code C is trace self-orthogonal if C ⊆ C ⊥T and trace self-dual if C = C ⊥T .
Thus a trace self-dual additive code is an (n, 2n ) code. Höhn [134] proved that the same
MacWilliams equation that holds for linear codes over F4 also holds for additive codes over
F4 ; namely (see (M3 ))
WC ⊥T (x, y) =
1
WC (y − x, y + 3x).
|C|
Exercise 556 Prove that if C is an additive (n, 2k ) code, then C ⊥T is an additive (n, 22n−k )
code.
To obtain an equivalent additive code C 2 from the additive code C 1 , as with linear codes
over F4 , we are allowed to permute and scale coordinates of C 1 . With linear codes we
are then permitted to conjugate the resulting vectors; with additive codes we can in fact
conjugate more generally as follows. Two additive codes C 1 and C 2 are equivalent provided
there is a map sending the codewords of C 1 onto the codewords of C 2 , where the map
consists of a permutation of coordinates followed by a scaling of coordinates by elements
of F4 followed by conjugation of some of the coordinates. In terms of generator matrices,
if G 1 is a generator matrix of C 1 , then there is a generator matrix of C 2 obtained from
G 1 by permuting columns, scaling columns, and then conjugating some of the resulting
columns. In the natural way we now define the automorphism group of the additive code
C, denoted ŴAut(C), to consist of all maps that permute coordinates, scale coordinates,
and conjugate coordinates that send codewords of C to codewords of C. By Exercise 557
permuting coordinates, scaling coordinates, and conjugating some coordinates of a trace
self-orthogonal (or trace self-dual) code does not change trace self-orthogonality (or trace
self-duality). However, Exercise 558 shows that if C 1 is linear over F4 and if C 2 is obtained
from C 1 by conjugating only some of the coordinates, then C 2 may not be linear!
Exercise 557 Verify that if x′ and y′ are obtained from the vectors x and y in Fn4 by
permuting coordinates, scaling coordinates, and conjugating some of the coordinates, then
x′ , y′ T = x, yT .
385
9.10 Additive codes over F4
Exercise 558 Find an example of a code C that is linear over F4 such that when the first
coordinate only is conjugated, the resulting code is no longer linear over F4 .
We now focus on trace self-dual additive codes. For linear codes over F4 , there is no
distinction between Hermitian self-dual and trace self-dual codes.
Theorem 9.10.3 Let C be a linear code over F4 . Then C is Hermitian self-dual if and only
if C is trace self-dual.
Proof: First assume that C is an [n, n/2] Hermitian self-dual code. Then x, y = 0 for all
x and y in C. Thus x, yT = Tr2 (x, y) = 0 implying that C ⊆ C ⊥T . As an additive code,
C is an (n, 2n ) code and hence, by Exercise 556, C ⊥T is also an (n, 2n ) code. Thus C = C ⊥T
and C is trace self-dual.
Now assume that C is an (n, 2n ) trace self-dual code. Because C is a linear code over
F4 , if x ∈ C, then ωx ∈ C and so x, ωxT = 0. By Lemma 9.10.2(d), wt(x) is even for all
x ∈ C. By Theorem 1.4.10, C ⊆ C ⊥ H. Because C has dimension n/2 as a linear code over
F4 , so does C ⊥ H implying that C is Hermitian self-dual.
Unlike Hermitian self-dual codes that contain only even weight codewords, trace selfdual additive codes may contain odd weight codewords; such codes cannot be linear over
F4 by Theorem 9.10.3. This leads to the following definitions. An additive code C is Type II
if C is trace self-dual and all codewords have even weight. Exercise 383 applies to additive
codes over F4 as the MacWilliams equation (M3 ) holds. In particular, if C is Type II of
length n, then C ⊥T = C contains a codeword of weight n implying that n is even. By
Corollary 9.2.2, F4 -linear Hermitian self-dual codes exist if and only if n is even. Hence
Type II additive codes exist if and only if n is even. If C is trace self-dual but some codeword
has odd weight (in which case the code cannot be F4 -linear), the code is Type I. The n × n
identity matrix generates an (n, 2n ) Type I additive code implying Type I codes exist for
all n.
There is an analog to Gleason’s Theorem for Type II additive codes, which is identical
to Gleason’s Theorem for Hermitian self-dual linear codes; see [291, Section 7.7].
Theorem 9.10.4 If C is a Type II additive code, then
WC (x, y) =
⌊ n6 ⌋
n
ai g6 (x, y) 2 −3i g7 (x, y)i ,
i=0
where g6 (x, y) = y 2 + 3x 2 and g7 (x, y) = y 6 + 45x 4 y 2 + 18x 6 . All ai s are rational and
i ai = 1.
There is also a bound on the minimum distance of a trace self-dual additive code [291,
Theorem 33]. If d I and d I I are the minimum distances of Type I and Type II additive codes,
386
Self-dual codes
Table 9.13 Number of inequivalent additive trace
self-dual codes
Length
1
2
3
4
5
6
7
8
Number Type I
Number Type II
1
0
1
1
3
0
4
2
11
0
20
6
59
0
?
21
respectively, of length n > 1, then
$n %
2
+ 1 if n ≡ 0 (mod 6),
$ n6 %
+ 3 if n ≡ 5 (mod 6),
dI ≤ 2
6%
$
2 n + 2 otherwise,
6
$n %
dI I ≤ 2
+ 2.
6
A code that meets the appropriate bound is called extremal. Extremal Type II codes have a
unique weight enumerator. This property is not true for extremal Type I codes.
As with our other self-dual codes, there is a mass formula [134] for counting the number
of trace self-dual additive codes of a given length.
Theorem 9.10.5 We have the following mass formulas:
(i) For trace self-dual additive codes of length n,
j
n
6n n!
=
(2i + 1).
|ŴAut(C j )|
i=1
(ii) For Type II additive codes of even length n,
j
n−1
6n n!
=
(2i + 1).
|ŴAut(C j )|
i=0
In both cases, the summation is over all j where {C j } is a complete set of representatives
of inequivalent codes of the given type.
With the assistance of this mass formula, Höhn [134] has classified all Type I additive codes
of lengths n ≤ 7 and all Type II additive codes of lengths n ≤ 8. Table 9.13 gives the number
of inequivalent codes for these lengths.
The Balance Principle and the rest of Theorem 9.4.1 also hold for trace self-dual additive
codes with minor modifications. Exercises 560, 561, and 562 use this theorem to find all
inequivalent trace self-dual additive codes of lengths 2 and 3 and all inequivalent extremal
ones of length 4.
Theorem 9.10.6 ([93]) Let C be an (n, 2n ) trace self-dual additive code. Choose a set of
coordinate positions P n 1 of size n 1 and let P n 2 be the complementary set of coordinate
387
9.10 Additive codes over F4
Table 9.14 Extremal Type I and II additive codes of length
2 ≤ n ≤ 16
n
dI
num I
dI I
num I I
n
dI
num I
dI I
num I I
2
3
4
5
6
7
8
9
1O
2E
2E
3E
3E
3O
4E
4E
1
1
1
1
1
3
2
8
2E
–
2E
–
4E
–
4E
–
1
–
2
–
1
–
3
–
10
11
12
13
14
15
16
4E
5E
5E
5O
6E
6E
6E
≥51
1
≥7
≥5
?
≥2
≥3
4E
–
6E
–
6E
–
6E
≥5
–
1
–
≥490
–
≥4
positions of size n 2 = n − n 1 . Let C i be the (n, 2ki ) subcode of C all of whose codewords
have support in P ni . The following hold:
(i) (Balance Principle)
k1 − n 1 = k2 − n 2 .
(ii) If we reorder coordinates so that P n 1 is the left-most n 1 coordinates and P n 2 is the
right-most n 2 coordinates, then C has a generator matrix of the form
A O
G = O B ,
D E
where [A O] is a generator matrix of C 1 and [O B] is a generator matrix of C 2 , O
being the appropriate size zero matrix. We also have the following:
(a) D and E each have n − k1 − k2 independent rows over F2 .
(b) Let A be the code of length n 1 generated by A, A D the code of length n 1 generated by the rows of A and D, B the code of length n 2 generated by B, and B E
the code of length n 2 generated by the rows of B and E. Then A⊥T = A D and
B ⊥T = B E .
Exercise 559 Prove Theorem 9.10.6.
Table 9.14, from [93], summarizes what is known about the number of inequivalent
extremal Type I and Type II additive codes through length 16. In the table, codes have
length “n”; “d I ” is the largest minimum weight for which a Type I code is known to exist,
while “d I I ” is the largest minimum weight for which a Type II code is known to exist. The
superscript “E” indicates that the code is extremal; the superscript “O” indicates the code is
not extremal, but no code of the given type can exist with a larger minimum weight, and so
the code is optimal. The number of inequivalent Type I and II codes of the given minimum
weight is listed under “num I ” and “num I I ”, respectively; when the number in the column
is exact (without ≥), the classification of those codes is complete. There exist extremal
388
Self-dual codes
Type II additive codes for all even lengths from 2 to 22 inclusive. In this range, except for
length 12, extremal Hermitian self-dual linear codes provide examples of extremal Type II
additive codes; see Table 9.3 and Example 9.10.8. An extremal Type II additive code of
length 24 has minimum weight 10; no such F4 -linear Hermitian self-dual code exists [183].
The existence of a (24, 224 , 10) Type II code is still an open question, however.
Research Problem 9.10.7 Either construct or establish the nonexistence of a (24, 224 , 10)
Type II additive code.
Example 9.10.8 By Table 9.3 there is no [12, 6, 6] Hermitian self-dual code. However,
Table 9.14 indicates that there is a unique (12, 212 , 6) Type II additive code. This code C is
called the dodecacode and has generator matrix
0 0 0 0 0 0 1 1 1 1 1 1
0 0 0 0 0 0 ω ω ω ω ω ω
1 1 1 1 1 1 0 0 0 0 0 0
ω ω ω ω ω ω 0 0 0 0 0 0
0 0 0 1 ω ω 0 0 0 1 ω ω
0 0 0 ω ω 1 0 0 0 ω ω 1
1 ω ω 0 0 0 1 ω ω 0 0 0 .
ω 1 ω 0 0 0 ω 1 ω 0 0 0
0 0 0 1 ω ω ω ω 1 0 0 0
0 0 0 ω 1 ω 1 ω ω 0 0 0
1 ω ω 0 0 0 0 0 0 ω ω 1
ω 1 ω 0 0 0 0 0 0 1 ω ω
The dodecacode has weight enumerator
WC (x, y) = y 12 + 396x 6 y 6 + 1485x 8 y 4 + 1980x 10 y 2 + 234x 12.
The existence of the dodecacode, despite the nonexistence of a [12, 6, 6] Hermitian
self-dual code, leads one to hope that the code considered in Research Problem 9.10.7
exists.
Exercise 560 Use the generator matrix given in Theorem 9.10.6 to construct generator
matrices for the inequivalent trace self-dual additive codes of length n = 2. (Hint: Let n 1 be
the minimum weight and place a minimum weight codeword in the first row of the generator
matrix.) In the process of this construction verify that there are only two inequivalent codes,
consistent with Table 9.13. Also give the weight distribution of each and identify which is
Type I and which is Type II. Which are extremal?
Exercise 561 Repeat Exercise 560 with n = 3. This time there will be three inequivalent
codes, verifying the entry in Table 9.13.
Exercise 562 Use the generator matrix given in Theorem 9.10.6 to construct generator
matrices for all the inequivalent extremal trace self-dual additive codes of length n = 4.
(Hint: Let n 1 = 2 and place a minimum weight codeword in the first row of the generator
389
9.11 Proof of the Gleason–Pierce–Ward Theorem
matrix.) In the process of this construction verify that there are only three inequivalent
extremal codes, consistent with Table 9.14. Also give the weight distribution of each and
identify which is Type I and which is Type II.
9.11
Proof of the Gleason–Pierce–Ward Theorem
In this section we prove the Gleason–Pierce–Ward Theorem stated at the beginning of the
chapter. Our proof comes from [347].
We begin with some notation. Let C be a linear code of length n over Fq , where q = p m
with p a prime. As usual Fq∗ denotes the nonzero elements of Fq . Define the map τ : Fq → Fq
by τ (0) = 0 and τ (α) = α −1 for all α ∈ Fq∗ . This map will be crucial to the proof of the
Gleason–Pierce–Ward Theorem; its connection with the theorem is hinted at in the following
lemma.
Lemma 9.11.1 We have the following:
(i) τ (αβ) = τ (α)τ (β) for all q and all α, β ∈ Fq .
(ii) When q = 2 or 3, τ is the identity map, and when q = 4, τ is the map τ (α) = α 2.
(iii) τ is also additive (that is, τ is a field automorphism) if and only if q = 2, 3, or 4.
Proof: Part (i) was proved in Exercise 176(a), while part (ii) is a straightforward exercise.
One direction of part (iii) is proved in Exercise 176(c). We give another proof different
from the one suggested in the hint in that exercise. Assume that τ is a field automorphism
r
of Fq . Then τ (α) = α p for some r with 0 ≤ r < m and all α ∈ Fq by Theorem 3.6.1. So
r
r
if α ∈ Fq∗ , then α p = α −1 and hence α p +1 = 1. This holds when α is primitive, yielding
(q − 1) | ( pr + 1) implying p m − 1 = pr + 1 if q > 2 by the restriction 0 ≤ r < m. The
only solutions of p m − 1 = pr + 1 with q > 2 are p = 3 with m = 1 and r = 0 and p = 2
with m = 2 and r = 1. So q = 2, 3, or 4. For the converse we assume q = 2, 3, or 4, in
which case τ is given in (ii). The identity map is always an automorphism and the map
τ (α) = α 2 in the case q = 4 is the Frobenius automorphism.
Exercise 563 Verify part (ii) of Lemma 9.11.1.
We now introduce notation that allows us to compare components of two different vectors.
Let u = u 1 · · · u n and v = v1 · · · vn be vectors in Fqn . For α ∈ Fq define
αu,v = |{i | u i = 0 and vi = αu i }|, and
∞u,v = |{i | u i = 0 and vi = 0}|.
We have the following elementary properties of αu,v and ∞u,v .
Lemma 9.11.2 Let C be a divisible linear code over Fq with divisor . Let u and v be
vectors in Fqn and α an element of Fq . Then:
(i) wt(u) = α∈Fq αu,v ,
(ii) wt(v − αu) = ∞u,v + wt(u) − αu,v ,
(iii) αu,v ≡ ∞u,v (mod ) if u and v are in C, and
(iv) q∞u,v ≡ 0 (mod ) if u and v are in C.
390
Self-dual codes
Proof: Part (i) is clear because wt(u) is the count of the number of nonzero coordinates of
u, and for all such coordinates i, vi = αu i for α = vi u i−1 ∈ Fq . To prove (ii) we note that
the coordinates are divided into four disjoint sets for each α:
(a) the coordinates i where u i = vi = 0,
(b) the coordinates i where u i = 0 but vi = 0,
(c) the coordinates i where u i = 0 and vi = αu i , and
(d) the coordinates i where u i = 0 and vi = αu i .
Clearly, wt(v − αu) is the sum of the number of coordinates in (b) and (d). By definition,
there are ∞u,v coordinates in (b). As there are wt(u) total coordinates in (c) and (d), and (c)
has αu,v coordinates, then (d) has wt(u) − αu,v coordinates, proving (ii).
Part (iii) follows from part (ii) as wt(v − αu) ≡ wt(u) ≡ 0 (mod ) because C is linear
and divisible by . Finally, combining (i) and (iii) gives
0 ≡ wt(u) ≡
α∈Fq
verifying (iv).
αu,v ≡
α∈Fq
∞u,v ≡ q∞u,v (mod ),
We next examine the situation that will lead to part (v) of the Gleason–Pierce–Ward
Theorem. To state this result, we need the concept of a coordinate functional of a code C over
Fq of length n. For 1 ≤ i ≤ n, define the coordinate functional f i : C → Fq by f i (c) = ci
where c = c1 · · · cn . Let N be the set of nonzero coordinate functionals of C. We define
the binary relation ∼ on N by f i ∼ f j provided there exists an element α ∈ Fq∗ such that
f i (c) = α f j (c) for all c ∈ C. We make two simple observations, which you are asked to prove
in Exercise 564. First, f i is a linear function and hence its values are completely determined
by its values on the codewords in a generator matrix. Second, ∼ is an equivalence relation on
N . In the next lemma, we will see that C is equivalent to a replicated code whenever C has a
divisor relatively prime to the order of the field. We say that a code C is a -fold replicated
code provided the set of coordinates of C is the disjoint union of subsets of coordinates
of size such that if {i j | 1 ≤ j ≤ } is any of these subsets and c = c1 · · · cn ∈ C, then
ci j = ci1 for 1 ≤ j ≤ . In terms of coordinate functionals, these functionals are equal
on the subsets of coordinates. The code with generator matrix [In/2 | In/2 ] in (v) of the
Gleason–Pierce–Ward Theorem is a two-fold replicated code.
Exercise 564 Show the following:
(a) The coordinate functional f i is a linear function from C to Fq , and its values
are completely determined by its values on the codewords in a generator matrix
of C.
(b) The relation ∼ on the set N of nonzero coordinate functionals is an equivalence relation
on N .
Lemma 9.11.3 Let C be an [n, k] code over Fq with divisor relatively prime to p. Then
the following hold:
(i) Each equivalence class of nonzero functionals has size a multiple of .
(ii) C is monomially equivalent to a -fold replicated code possibly with some coordinates
that are always 0.
391
9.11 Proof of the Gleason–Pierce–Ward Theorem
Proof: The proof of (i) is by induction on n. For the initial case, we assume that all
codewords have weight n, in which case k = 1 and C is an [n, 1, n] code by the Singleton
Bound. As a generator matrix for C is one codeword u with each entry nonzero, the n
coordinate functionals are clearly equivalent. As 0 ≡ wt(u) ≡ n (mod ), the result holds
in this case.
Now assume that u is a nonzero codeword of C with weight w < n. By assumption | w.
By permuting the coordinates (which permutes the coordinate functionals but does not
change the sizes of the equivalence classes), we may assume that u = u 1 · · · u w 0 · · · 0, where
each u i is nonzero. If c = c1 · · · cn is a codeword in C, write c = (c′ |c′′ ), where c′ = c1 · · · cw
and c′′ = cw+1 · · · cn . Define C ′ = {c′ | c = (c′ |c′′ ) ∈ C} and C ′′ = {c′′ | c = (c′ |c′′ ) ∈ C}. So
C ′ and C ′′ are both codes of lengths less than n. Notice that no coordinate functional f i of
C with 1 ≤ i ≤ w is equivalent to any coordinate functional f j with w + 1 ≤ j ≤ n. Also
the coordinate functionals for C are determined by the coordinate functionals for C ′ and
C ′′ . Therefore if we show that both C ′ and C ′′ are divisible by , then by induction their
coordinate functionals split into classes with sizes that are multiples of , and thus the
coordinate functionals of C split into classes of exactly the same sizes. To show both C ′
and C ′′ are divisible by , let c = (c′ |c′′ ) ∈ C. We need to show that wt(c′ ) and wt(c′′ ) are
divisible by . Since wt(c) is divisible by , it suffices to show that wt(c′′ ) is divisible by .
Clearly, wt(c′′ ) = ∞u,c ; as p ∤ , by Lemma 9.11.2(iv), ∞u,c ≡ 0 (mod ). Part (i) now
follows.
Part (ii) is obvious from part (i) as you can rescale the coordinate functionals in an
equivalence class so that the new functionals are identical.
This lemma shows what happens when we have a divisor relatively prime to p. The next
two lemmas show what happens when we have a divisor that is a power of p. We extend
the map τ to codewords of a code by acting with τ componentwise. Note that τ is injective.
The first lemma gives its image.
Lemma 9.11.4 Let C have divisor , where is a power of p. If > 2 or q > 2, then
τ (C) ⊆ C ⊥.
Proof: It suffices to show that if u, v ∈ C, then τ (u) · v = 0. As
αu,v = |{i | u i = 0 and vi = αu i }| = |{i | u i = 0 and τ (u i )vi = α}|,
we have
τ (u) · v =
α(αu,v ),
α∈Fq
noting that no component i contributes to the inner product when u i = 0 or vi = 0. By
Lemma 9.11.2(iii), αu,v ≡ ∞u,v (mod ), and so αu,v ≡ ∞u,v (mod p) as p | . Thus
τ (u) · v = ∞u,v
α.
α∈Fq
If q > 2,
α∈Fq α = 0. If q = 2, then 4 | implying ∞u,v ≡ 0 (mod 2) by Lemma
9.11.2(iv). Thus in either case τ (u) · v = 0.
392
Self-dual codes
Note that if the hypothesis of Lemma 9.11.4 holds and C has dimension n/2, then
τ (C) = C ⊥ . We examine the possible values of that could occur when q ∈ {2, 3, 4}.
Lemma 9.11.5 Assume τ (C) = C ⊥ and q ∈ {2, 3, 4}. Then:
(i) p is a divisor of C, and
(ii) the highest power of p dividing C is p if q ∈ {3, 4} and is 4 if q = 2.
Proof: As τ is the identity when q ∈ {2, 3}, C is self-dual. If q = 4, C is Hermitian self-dual
by Lemma 9.11.1(ii). Thus (i) follows from Theorem 1.4.5.
To prove (ii), we permute the coordinates so that C has a generator matrix G 1 = [In/2 A]
in standard form; note that this does not affect the hypothesis of the lemma. As τ (C) = C ⊥ ,
[In/2 τ (A)] is a generator matrix for C ⊥ ; hence G 2 = [−τ (A)T In/2 ] is a generator matrix of
C by Theorem 1.2.1. Let u = 10 · · · 0a1 · · · an/2 be the first row of G 1 . Choose i so that ai =
0; such an entry exists as all weights are divisible by at least 2. Let v = b1 · · · bn/2 0 · · · 1 · · · 0
be a row of G 2 where entry 1 is in coordinate i. Both u and v are in C, and so 0 =
u · τ (v) = τ (b1 ) + ai implying τ (b1 ) = −ai . When q = 2, wt(u + v) = wt(u) + wt(v) − 4
as ai = b1 = 1 showing that the highest power of 2 dividing C is 4. When q = 3, wt(u + v) =
wt(u) + wt(v) − 3 as {ai , b1 } = {−1, 1} implying that the highest power of 3 dividing C is 3.
When q = 4, choose α ∈ F4 with α = 0 and α = ai . Then wt(u + αv) = wt(u) + wt(v) − 2
proving that the highest power of 2 dividing C is 2.
We need one final lemma.
Lemma 9.11.6 Let C be an [n, k] code over Fq divisible by > 1 where either > 2 or
q > 2. Then:
(i) k ≤ n/2, and
(ii) if k = n/2 and f is some integer dividing with f > 1 and p ∤ f , then f = 2 and C
is equivalent to the code over Fq with generator matrix [In/2 In/2 ].
Proof: Suppose f | with f > 1 and p ∤ f. Then by Lemma 9.11.3(ii), C is monomially
equivalent to an f -fold replicated code possibly with some coordinates that are always 0.
As f > 1, k ≤ n/ f ≤ n/2, which gives (i) in this case. This also shows that if k = n/2,
then f = 2 and there can be no coordinates that are always 0 producing (ii). So assume that
is a power of p; part (i) follows in this case from Lemma 9.11.4 as τ (C) ⊆ C ⊥ .
We now complete the proof of the Gleason–Pierce–Ward Theorem. If has a factor
greater than 1 and relatively prime to p, we have (v) by Lemma 9.11.6. So we may assume
that is a power of p. We may also assume that > 2 or q > 2, as otherwise we have (i).
We need to show that we have one (or more) of (ii), (iii), (iv), or (v). As C has dimension
n/2, by Lemma 9.11.4 τ (C) = C ⊥. Permuting columns as previously, we may assume that
G 1 = [In/2 A] is a generator matrix for C and G 2 = [In/2 τ (A)] is a generator matrix for
C ⊥ . Each row of A has at least one nonzero entry as C is divisible by at least 2. If every
row and column of A has exactly one nonzero entry, then C satisfies (v). Hence, permuting
columns again, we may assume the first two rows of G 1 are
u = (1, 0, 0, . . . , 0, a, . . . )
v = (0, 1, 0, . . . , 0, b, . . . ),
and
393
9.12 Proofs of some counting formulas
where a and b are nonzero and located in coordinate (n/2) + 1. Let α and β be arbitrary
elements of Fq . Then:
αu + βv = (α, β, 0, . . . , 0, αa + βb, . . . )
and
τ (αu + βv) = (τ (α), τ (β), 0, . . . , 0, τ (αa + βb), . . . ).
As τ (αu + βv) ∈ τ (C) = C ⊥ , by examining the generator matrix G 2 , the only possibility is
τ (αu + βv) = τ (α)τ (u) + τ (β)τ (v) implying that τ (αa + βb) = τ (α)τ (a) + τ (β)τ (b) =
τ (αa) + τ (βb) by Lemma 9.11.1(i). Therefore as a and b are nonzero, τ is additive. By
Lemma 9.11.1(iii), q is 2, 3, or 4. Lemma 9.11.5 completes the proof.
9.12
Proofs of some counting formulas
In this section we will present the proof of the counting formulas given in Section 9.5 for the
number of doubly-even codes of length n. The basis for the proof is found in the orthogonal
decomposition of even binary codes given in Section 7.8. Recall from Theorem 7.8.5 that
an even binary code C can be decomposed as an orthogonal sum of its hull of dimension r
and either m H-planes or m − 1 H-planes and one A-plane; C has dimension 2m + r . This
decomposition partially determines the form (H, O, or A) of C: if one plane is an A-plane,
the code has form A; if there are m H-planes, C has form H if the hull of C is doubly-even
and form O if the hull of C is singly-even.
Let E n be the [n, n − 1] binary code of all even weight vectors where n is even. We can
determine the form of E n merely by knowing the value of n. In order to show this, we need
the following lemma.
Lemma 9.12.1 If C is a doubly-even binary code of dimension m that does not contain the
all-one vector 1, then C is contained in an orthogonal sum of m H-planes. Furthermore 1
is orthogonal to and disjoint from this orthogonal sum.
Proof: Let {x1 , x2 , . . . , xm } be a basis of C. To prove the result we will construct {y1 , y2 , . . . , ym } so that Hi = span{xi , yi } is an H-plane for 1 ≤ i ≤ m and
span{x1 , . . . , xm , y1 , . . . , ym } is the orthogonal sum H1 ⊥H2 ⊥ · · · ⊥Hm .
To construct y1 , first note that x1 ∈ span{x2 , . . . , xm , 1} since 1 ∈ C. Thus
span{x2 , . . . , xm , 1}⊥ \ span{x1 }⊥ is nonempty. Let y1 be in this set. As y1 ⊥1, y1 has
/ x1 and x1 is doubly-even, H1 = span{x1 , y1 } is an H-plane by Exeven weight. As y1 ⊥
ercise 416. Clearly, x1 ∈ span{x2 , . . . , xm }. Neither y1 nor x1 + y1 is in span{x2 , . . . , xm }
as otherwise y1 ∈ C, which is a contradiction. Therefore, H1 ∩ span{x2 , . . . , xm } = {0}.
Clearly, x1 and y1 are orthogonal to span{x2 , . . . , xm }. Thus span{x1 , x2 , . . . , xm , y1 } =
H1 ⊥span{x2 , . . . , xm }.
We now want to choose y2 ∈ span{x1 , x3 , . . . , xm , y1 , 1}⊥ with y2 ∈ span{x2 }⊥ . If no such
y2 exists, then x2 ∈ span{x1 , x3 , . . . , xm , y1 , 1}, implying x2 = i=2 αi xi + βy1 + γ 1. As
x1 · x2 = 0, we obtain β = 0 and hence x2 ∈ span{x1 , x3 , . . . , xm , 1}, which is a contradiction. So y2 exists; as with y1 , it is of even weight and H2 = span{x2 , y2 } is an H-plane. By
our choices, H2 is orthogonal to both H1 and span{x3 , . . . , xm }. If {x1 , x2 , . . . , xm , y1 , y2 }
394
Self-dual codes
is not linearly independent, then y2 ∈ span{x1 , x2 , . . . , xm , y1 }; this is a contradiction as x2
is orthogonal to span{x1 , x2 , . . . , xm , y1 } but not y2 . Thus span{x1 , x2 , . . . , xm , y1 , y2 } =
H1 ⊥H2 ⊥span{x3 , . . . , xm }.
The rest of the proof follows inductively with Exercise 566 proving the last
statement.
Exercise 565 In the proof of Lemma 9.12.1, show how to construct the vector
y3 and the H-plane H3 as the next step in the induction process and show that
span{x1 , x2 , . . . , xm , y1 , y2 , y3 } = H1 ⊥H2 ⊥H3 ⊥span{x4 , . . . , xm }.
Exercise 566 Prove the last statement in Lemma 9.12.1. Hint: Suppose 1 ∈ H1 ⊥ · · · ⊥Hm .
m
m
ai xi + i=1
bi yi . Then examine xi · 1 for each i.
Write 1 = i=1
Theorem 9.12.2 [(270)] The following hold:
(i) If n ≡ 0 (mod 8), then E n has form H.
(ii) If n ≡ 4 (mod 8), then E n has form A.
(iii) If n ≡ 2 (mod 4), then E n has form O.
Proof: By Theorem 7.8.5, decompose E n as an orthogonal sum of its hull and either m
H-planes or m − 1 H-planes and one A-plane. The hull of E n is H = E n ∩ E ⊥
n = {0, 1}.
Therefore the dimension of E n is n − 1 = 2m + 1 as r = 1 is the dimension of H; in
particular m = (n − 2)/2. The hull of E n is singly-even if n ≡ 2 (mod 4), implying by
Theorem 7.8.5 that E n has form O giving (iii).
Suppose that n ≡ 4 (mod 8). As H is doubly-even, E n has either form H or A. If it has
form H, then E n = span{1}⊥H1 ⊥ · · · ⊥Hm . Letting xi be a doubly-even vector in Hi , we
have the set {1, x1 , . . . , xm } generating a doubly-even subcode of dimension m + 1 = n/2;
such a subcode is Type II, contradicting Corollary 9.2.2, proving (ii).
Finally, suppose that n ≡ 0 (mod 8). Then E n contains the Type II subcode equal to the
3 . Thus E n contains a
direct sum of n/8 copies of the [8, 4, 4] extended Hamming code H
′
doubly-even subcode C of dimension n/2. Let C be any [n, n/2 − 1] vector space complement of 1 in C ′ . By Lemma 9.12.1, C is contained in the orthogonal sum H1 ⊥ · · · ⊥Hn/2−1
of n/2 − 1 H-planes. This orthogonal sum is orthogonal to, and disjoint from, the hull
H = span{1} of E n . Thus span{1}⊥H1 ⊥ · · · ⊥Hn/2−1 ⊆ E n ; as both codes have dimension
n − 1, they are equal proving (i).
Once we know the form of E n , we can count the number of singly- and doubly-even
vectors in Fn2 .
Corollary 9.12.3 Let n be even. Let a be the number of doubly-even vectors in Fn2 and let
b be the number of singly-even vectors in Fn2 . The values of a and b are as follows:
(i) If n ≡ 0 (mod 8), then a = 2n−2 + 2n/2−1 and b = 2n−2 − 2n/2−1 .
(ii) If n ≡ 4 (mod 8), then a = 2n−2 − 2n/2−1 and b = 2n−2 + 2n/2−1 .
(iii) If n ≡ 2 (mod 4), then a = b = 2n−2 .
Proof: This follows directly from Theorems 7.8.6 and 9.12.2.
We need one more result before we can prove the counting formula for the number of
Type II codes of length n where n ≡ 0 (mod 8).
395
9.12 Proofs of some counting formulas
Lemma 9.12.4 For i = 1 and 2, let C i be an [n, ki ] even binary code with C i ∩ C i⊥ = {0}
and k1 + 2 ≤ k2 . Suppose that C 1 ⊂ C 2 . Then there exists either an H-plane or A-plane P
in C 2 orthogonal to C 1 with C 1 ∩ P = {0}.
Proof: Let {x1 , . . . , xk1 } be a basis of C 1 . Define the linear function f : C 2 → Fk21 by f (x) =
(x · x1 , . . . , x · xk1 ). If f |C1 is the restriction of f to C 1 , the kernel of f |C1 is C 1 ∩ C ⊥
1 = {0}.
Thus f |C1 , and hence f , is surjective. The kernel K of f is an [n, k2 − k1 ] code orthogonal to
⊥
⊥
C 1 . As C 1 ∩ C ⊥
1 = {0}, K ∩ C 1 = {0} and so C 2 = C 1 ⊥K. If x ∈ K ∩ K , then x ∈ C 2 ∩ C 2 ;
hence K ∩ K⊥ = {0}. The result follows by Lemma 7.8.2 applied to K.
We now prove the counting formula given in Theorem 9.5.5(i). While the result is found
in [219], the proof is different.
Theorem 9.12.5 If n ≡ 0 (mod 8), then there are
n
2 −2
i=0
(2i + 1)
Type II codes of length n.
Proof: Let µn,k be the number of [n, k] doubly-even codes containing 1. Clearly, µn,1 = 1.
As in the proof of Theorem 9.5.1, we find a recurrence relation for µn,k . We illustrate
the recursion process by computing µn,2 . By Corollary 9.12.3, there are 2n−2 + 2n/2−1 − 2
doubly-even vectors that are neither 0 nor 1. Each of these vectors is in a unique doublyeven [n, 2] code containing 1; each such code contains two of these vectors. So µn,2 =
2n−3 + 2n/2−2 − 1.
Let C be any [n, k] doubly-even code containing 1. As C is self-orthogonal, if C ′ is any
vector space complement of span{1} in C, then C = span{1}⊥C ′ .
By Lemma 9.12.1, applied to the [n, k − 1] code C ′ , C ′ ⊆ H1 ⊥ · · · ⊥Hk−1 where Hi are
H-planes. Let E be any vector space complement of 1 in E n containing H1 ⊥ · · · ⊥Hk−1 .
Clearly, E n = span{1}⊥E and E ∩ E ⊥ = {0}. Note that E has even dimension n − 2. Now
apply Lemmas 7.8.4 and 9.12.4 inductively, initially beginning with C 1 = H1 ⊥ · · · ⊥Hk−1
and always using C 2 = E in each step of the induction. Then there exist Hk , . . . , Hn/2−1
which are either H-planes or A-planes such that E = H1 ⊥ · · · ⊥Hn/2−1 . Thus E n =
span{1}⊥H1 ⊥ · · · ⊥Hn/2−1 . By Exercise 418, we may assume that at most one Hi is an
A-plane. But, if one is an A-plane, then E n has form A, contradicting Theorem 9.12.2.
If we want to extend C to an [n, k + 1] doubly-even code C ′′ , we must add a nonzero
doubly-even vector c′′ from D = Hk ⊥ · · · ⊥Hn/2−1 . By Lemma 7.8.4, D ∩ D⊥ = {0}. By
Theorem 7.8.6, there are 2n−2k−1 + 2n/2−k−1 doubly-even vectors in D, implying that there
are 2n−2k−1 + 2n/2−k−1 − 1 choices for c′′ . However, every such C ′′ has 2k − 1 doubly-even
subcodes of dimension k containing 1 by Exercise 522(a). This shows that
µn,k+1 =
2n−2k−1 + 2n/2−k−1 − 1
µn,k .
2k − 1
When k = 1, we obtain µn,2 = 2n−3 + 2n/2−2 − 1, in agreement with the value we obtained
396
Self-dual codes
above. Using this recurrence relation, the number of Type II codes of length n is
µn,n/2 =
=
2n−3 + 2n/2−2 − 1
21 + 20 − 1 23 + 21 − 1 25 + 22 − 1
·
·
·
·
·
· µn,1
2n/2−1 − 1 2n/2−2 − 1 2n/2−3 − 1
21 − 1
2n−3 + 2n/2−2 − 1
21 + 20 − 1 23 + 21 − 1 25 + 22 − 1
·
·
·
·
·
21 − 1
22 − 1
23 − 1
2n/2−1 − 1
= 2(21 + 1)(22 + 1) · · · (2n/2−2 + 1),
which is the desired result.
Exercise 567 Prove that if n ≡ 0 (mod 8), then there are
k−1 n−2i−1
2
+ 2n/2−i−1 − 1
i=1
2i − 1
[n, k] doubly-even binary codes containing 1 for any k ≤ n/2.
In Theorem 9.5.5(ii), there is also a counting formula for the number of Type II codes
containing a given doubly-even code. We leave its proof, which follows along the lines of
the proof of Theorem 9.12.5, to Exercise 568.
Exercise 568 Prove Theorem 9.5.5(ii).
The proof of the counting formula for the number of [n, (n/2) − 1] doubly-even codes
when n ≡ 4 (mod 8), found in Theorem 9.5.7(i), is analogous to the proof of Theorem 9.12.5. (The counting formula for the number of [n, (n/2) − 1] doubly-even codes
when n ≡ 2 (mod 4) is found in Theorem 9.5.7(ii) and proved in Section 9.5.) For the
proofs of the counting formulas for the number of Type III and Type IV codes, we refer the
reader to [217, 259, 260].
Exercise 569 Prove Theorem 9.5.7(i).
10
Some favorite self-dual codes
In this chapter we examine the properties of the binary and ternary Golay codes, the hexacode, and the Pless symmetry codes. The Golay codes and the hexacode have similar
properties while the Pless symmetry codes generalize the [12, 6, 6] extended ternary Golay
code. We conclude the chapter with a section showing some of the connections between
these codes and lattices.
10.1
The binary Golay codes
In this section we examine in more detail the binary Golay codes of lengths 23 and 24. We
have established the existence of a [23, 12, 7] and a [24, 12, 8] binary code in Section 1.9.1.
Recall that our original construction of the [24, 12, 8] extended binary Golay code used
a bordered reverse circulant generator matrix, and the [23, 12, 7] code was obtained by
puncturing. Since then we have given different constructions of these codes, both of which
were claimed to be unique codes of their length, dimension, and minimum distance. We
first establish this uniqueness.
10.1.1 Uniqueness of the binary Golay codes
Throughout this section let C be a (possibly nonlinear) binary code of length 23 and minimum distance 7 containing M ≥ 212 codewords, one of which is 0. In order to prove the
uniqueness of C, we first show it has exactly 212 codewords and is perfect. We then show it
has a uniquely determined weight distribution and is in fact linear. This proof of linearity
follows along the lines indicated by [244].
Lemma 10.1.1 C is a perfect code with 212 codewords.
Proof: By the Sphere Packing Bound,
23
23
23
23
M
+
+
+
≤ 223 ,
0
1
2
3
(10.1)
implying M ≤ 212 . Hence M = 212 , and we have equality in (10.1), which indicates that C
is perfect.
Lemma 10.1.2 Let Ai = Ai (C) be the weight distribution of C. Then
A0 = A23 = 1, A7 = A16 = 253, A8 = A15 = 506, A11 = A12 = 1288.
398
Some favorite self-dual codes
Proof: As C is perfect, the spheres of radius 3 centered at codewords are disjoint and
23
cover F23
2 . In this proof, let N (w, c) denote the number of vectors in F2 of weight w in
a sphere of radius 3 centered at a codeword of weight c; note that N (w, c) is independent
of the codeword chosen. Note also that N (w, c) = 0 when w and c differ by more than 3.
Since the spheres of radius 3 pack F23
2 disjointly, we have
23
c=0
23
N (w, c)Ac =
,
w
(10.2)
as every vector in F23
2 of weight w is in exactly one sphere of radius 3 centered at a codeword.
The verification of the weight distribution merely exploits (10.2) repeatedly.
Clearly, A0 = 1 and Ai = 0 for 1 ≤ i ≤ 6. We now compute A7 , A8 , . . . , A23 in order.
The strategy when computing Ai is to count the number of vectors in F23
2 of weight w = i − 3
that are in spheres of radius 3 centered at codewords and then use (10.2).
Computation of A7 : Any vector x ∈ F23
2 with wt(x) = 4 must be in a unique sphere centered
7
at some codeword c with wt(c) = 7. Clearly,
7 supp(x)
23 ⊂ supp(c) showing that N (4, 7) = 4 .
Hence, (10.2) becomes N (4, 7)A7 = 4 A7 = 4 , implying A7 = 253.
Computation of A8 : Any vector x ∈ F23
2 with wt(x) = 5 must be in a unique sphere centered at some codeword
c
with
wt(c)
=
8 7 or 8. In either case, supp(x) ⊂ supp(c), showing
7
and
N
(5,
8)
=
. Equation (10.2) becomes N (5, 7)A7 + N (5, 8)A8 =
that
N
(5,
7)
=
5
5
7
8
23
A
+
A
=
yielding
A
=
506.
7
8
8
5
5
5
Computation of A9 : Any vector x ∈ F23
2 with wt(x) = 6 must be in a unique sphere centered at some codeword c with wt(c) = 7, 8, or 9. If wt(c) = 7, supp(x)
⊂ supp(c) or
=
supp(x) overlaps supp(c) in five coordinates. Thus N(6, 7) = 76 + 75 16
1
343. If
wt(c) = 8 or 9, supp(x) ⊂ supp(c). Hence, N (6, 8) = 86 = 28 and N (6, 9) = 96 = 84.
(10.2) is N (6, 7)A7 + N (6, 8)A8 + N (6, 9)A9 = 343 · 253 + 28 · 506 + 84A9 =
Equation
23
.
Thus
A9 = 0.
6
The remaining values of Ai can be computed in a similar way. We only give the equations
that arise when 10 ≤ i ≤ 12, and leave their derivation and solution as exercises.
A7 + 87 + 86 15
A8 + 10
A10 = 23
.
Computation of A10 : 1 + 76 16
1
1
7
7
11
23
815
16 716
Computation of A11 : 1 + 6 2 A7 + 1 + 7 1 A8 + 8 A11 = 8 .
815
Computation of A12 : 16
.
A12 = 23
A11 + 12
+ 7 2 A8 + 11
A7 + 15
9
9
9
1
2
The reader is invited to complete the proof.
We remark that we do not actually have to calculate the weight distribution of C; examining the proof, we see that each Ai is unique. Since we know that the [23, 12, 7] binary Golay
code G 23 exists, C must have the same weight distribution as G 23 , which was presented in
Exercise 403.
Exercise 570 In the proof of Lemma 10.1.2,
(a) verify and solve the equations for Ai with 10 ≤ i ≤ 12, and
(b) find and solve the equations for Ai with 13 ≤ i ≤ 23.
399
10.1 The binary Golay codes
Exercise 571 Use the technique presented in the proof of Lemma 10.1.2 to compute
the weight distribution of the perfect [15, 11, 3] binary Hamming code H4 . Compare this
answer to that found in Exercise 387.
Lemma 10.1.3 Let
C be obtained from C by adding an overall parity check. Then
C is a
[24, 12, 8] linear Type II code, and C is also linear.
Proof: By Lemma 10.1.2 every codeword of
C is doubly-even. We show that distinct
codewords are orthogonal. For c ∈ C, let
c be its extension. Suppose that c1 and c2 are in
c2 ) and C 1 = c1 + C. As C has minimum distance 7, so does C 1 . As
C. Let w = wt(
c1 +
C1 =
c1 +
C is doubly-even
0 ∈ C 1 , Lemmas 10.1.1 and 10.1.2 can be applied to C 1 . Thus
c2 are orthogonal.
implying that 4 | w. So
c1 and
Let L be the linear code generated by the codewords of
C; since all codewords of
C are
orthogonal to each other (and to themselves as each is even), L has dimension at most 12.
But
C has 212 codewords, implying L =
C. Hence
C and its punctured code C are linear.
The minimum weight of C follows from Lemma 10.1.2.
Corollary 10.1.4 Let A be a binary, possibly nonlinear, code of length 24 and minimum
distance 8 containing 0. If A has M codewords with M ≥ 212 , then A is a [24, 12, 8] linear
Type II code.
Proof: Let C be the length 23 code obtained from A by puncturing on some coordinate i.
Then C has M codewords, minimum distance at least 7, and contains 0. By Lemmas 10.1.1
and 10.1.3, M = 212 and C is linear. Note that A may not be the extension of C; however,
we can puncture A on a different coordinate j to get a code C 1 . The same argument applied
to C 1 shows that C 1 is linear. Thus A is linear by Exercise 572; hence it is a [24, 12, 8] code.
By Lemma 10.1.3, A is Type II if we can show that A =
C. This is equivalent to showing
that A is even. Suppose that it is not. Let a ∈ A with wt(a) odd. By puncturing A on a
coordinate where a is 0, we obtain a code with weight distribution given in Lemma 10.1.2.
Thus wt(a) is 7, 11, 15, or 23. Now puncture A on a coordinate where a is 1. This punctured
code still has the weight distribution in Lemma 10.1.2, but contains a vector of weight 6,
10, 14, or 22, which is a contradiction.
Exercise 572 Let A be a possibly nonlinear binary code of minimum distance at least 2.
Choose two coordinates i and j and puncture A on each to obtain two codes Ai and A j .
Suppose that Ai and A j are both linear. Prove that A is linear.
We now know that C and
C are [23, 12, 7] and [24, 12, 8] linear codes. We show the
uniqueness of
C by showing the uniqueness of a 5-(24, 8, 1) design.
Theorem 10.1.5 The 5-(24, 8, 1) design is unique up to equivalence.
Proof: Let D = (P, B) be a 5-(24, 8, 1) design. The intersection numbers of D are given
in Figure 8.3; from these numbers we see that two distinct blocks intersect in either 0, 2,
or 4 points. We will construct blocks Bi for 1 ≤ i ≤ 12 and associated codewords ci where
400
Some favorite self-dual codes
c1
c2
c3
c4
c5
c6
c7
c8
c9
c10
c11
c′12
c′′12
1
1
1
1
1
1
0
1
1
1
1
1
1
1
2
1
1
1
1
0
1
1
1
0
1
0
1
1
3
1
1
1
0
1
1
1
0
1
0
1
1
1
4
1
1
0
1
1
1
0
1
1
0
0
0
0
5
1
0
1
1
1
1
0
0
0
1
1
0
0
6
1
0
0
0
0
0
1
1
1
1
1
0
0
7
1
0
0
0
0
0
0
0
0
0
0
0
0
8
1
0
0
0
0
0
0
0
0
0
0
1
1
9
0
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
1
0
1
0
0
0
0
0
0
0
0
1
0
0
1
2
0
1
0
0
0
0
0
0
0
0
0
0
0
1
3
0
0
1
0
0
0
0
1
0
0
0
0
0
1
4
0
0
1
0
0
0
0
0
1
0
0
0
0
1
5
0
0
1
0
0
0
0
0
0
0
0
0
0
1
6
0
0
0
1
0
0
1
0
0
0
0
0
0
1
7
0
0
0
1
0
0
0
0
1
0
0
1
0
1
8
0
0
0
1
0
0
0
0
0
0
1
0
1
1
9
0
0
0
0
1
0
1
0
0
0
0
0
0
2
0
0
0
0
0
1
0
0
1
0
0
0
0
1
2
1
0
0
0
0
1
0
0
0
0
1
0
1
0
2
2
0
0
0
0
0
1
1
0
0
0
0
0
0
2
3
0
0
0
0
0
1
0
1
0
0
1
1
0
2
4
0
0
0
0
0
1
0
0
1
1
0
0
1
Figure 10.1 Codewords from a 5-(24, 8, 1) design.
supp(ci ) = Bi , which we give in Figure 10.1. Let P = {1, 2, . . . , 24}. Since we prove D is
unique up to equivalence, we will be reordering points where possible.
By reordering points, we can assume B1 = {1, 2, 3, 4, 5, 6, 7, 8}. We repeatedly use the
fact that any five points determine a unique block. By reordering points, we may assume that
the block containing {1, 2, 3, 4, 9} is B2 = {1, 2, 3, 4, 9, 10, 11, 12}. By similar reordering,
the block containing {1, 2, 3, 5, 9} is B3 = {1, 2, 3, 5, 9, 13, 14, 15}, the block containing
{1, 2, 4, 5, 9} is B4 = {1, 2, 4, 5, 9, 16, 17, 18}, the block containing {1, 3, 4, 5, 9} is
B5 = {1, 3, 4, 5, 9, 19, 20, 21}, and finally the block containing {2, 3, 4, 5, 9} is B6 =
{2, 3, 4, 5, 9, 22, 23, 24}. We repeatedly used the fact in constructing Bi , for 2 ≤ i ≤ 6,
that we began with five points, exactly four of which were in the previously constructed
blocks. Let B7 be the block containing {1, 2, 3, 6, 9}; B7 intersects B1 , B2 , and B3 in
no other points and intersects B4 , B5 , and B6 each in one more point. Thus we may
assume B7 = {1, 2, 3, 6, 9, 16, 19, 22}. Let B8 be the block containing {1, 2, 4, 6, 9}; B8
intersects B1 , B2 , B4 , and B7 in no other points and intersects B3 , B5 , and B6 each in
one more point. Thus we may assume B8 = {1, 2, 4, 6, 9, 13, 20, 23}. By Exercise 573,
we may assume that B9 = {1, 3, 4, 6, 9, 14, 17, 24}, B10 = {1, 2, 5, 6, 9, 10, 21, 24}, B11 =
′
′′
{1, 3, 5, 6, 9, 11, 18, 23}, and B12 is either B12
= {1, 2, 3, 8, 9, 17, 21, 23} or B12
=
{1, 2, 3, 8, 9, 18,20, 24}.
′
′′
or B12
, the 12 weight 8 vectors whose supports are the blocks generate
For either B12
a 12-dimensional code in F24
2 , as can be easily seen by row reduction. Call the resulting
′
′′
codes C and C . By Example 8.3.2, the design D is held by the weight 8 codewords in
a [24, 12, 8] binary code
C, which is Type II by Corollary 10.1.4. Thus there are at most
two designs up to equivalence, one for each code C ′ and C ′′ . But using a computer algebra
package, one can show that C ′ ≃ C ′′ . Thus the design is unique up to equivalence.
Exercise 573 In the proof of Theorem 10.1.5, show that after B1 , . . . ,B8 have been determined, upon reordering points:
401
10.1 The binary Golay codes
(a)
(b)
(c)
(d)
the block B9 containing {1, 3, 4, 6, 9} is B9 = {1, 3, 4, 6, 9, 14, 17, 24},
the block B10 containing {1, 2, 5, 6, 9} is B10 = {1, 2, 5, 6, 9, 10, 21, 24},
the block B11 containing {1, 3, 5, 6, 9} is B11 = {1, 3, 5, 6, 9, 11, 18, 23}, and
the block B12 containing {1, 2, 3, 8, 9} is either {1, 2, 3, 8, 9, 17, 21, 23} or
{1, 2, 3, 8, 9, 18, 20, 24}.
Theorem 10.1.6 Both length 23 and length 24, possibly nonlinear, binary codes each
containing 0 with M ≥ 212 codewords and minimum distance 7 and 8, respectively, are
unique up to equivalence. They are the [23, 12, 7] and [24, 12, 8] binary Golay codes.
Proof: Let A be the code of length 24. By Corollary 10.1.4, A is a [24, 12, 8] Type II code.
By Theorem 9.3.10, the codewords of weight 8 hold a 5-design. By Theorem 10.1.5 this
design is unique, and by the argument in the proof, the weight 8 codewords span the code.
Hence A is unique.
Let C be the code of length 23. By Lemma 10.1.3,
C is a [24, 12, 8] code, which is
unique. So C is
C punctured on some coordinate. The proof is complete if we show all
punctured codes are equivalent. By Example 6.6.23, the extended binary quadratic residue
code is a [24, 12, 8] code. So it is equivalent to
C. The extended quadratic residue code is an
extended cyclic code and by the Gleason–Prange Theorem, its automorphism group must
be transitive; hence all punctured codes are equivalent by Theorem 1.6.6.
10.1.2 Properties of binary Golay codes
When we introduced the binary Golay codes in Section 1.9.1, we gave a generator matrix
in standard form for the [24, 12, 8] extended binary Golay code G 24 . In Example 6.6.23,
we showed that by extending either odd-like binary quadratic residue code of length
23, we obtain a [24, 12, 8] code, which by Theorem 10.1.6 is equivalent to G 24 . So one
representation of G 24 is as an extended quadratic residue code. Puncturing on the extended
coordinate gives a representation of the [23, 12, 7] binary Golay code G 23 as an odd-like
quadratic residue code of length 23.
Viewing G 23 as a quadratic residue code, there are two possible idempotents given in
Theorem 6.6.5. One of them is
e23 (x) = x + x 2 + x 3 + x 4 + x 6 + x 8 + x 9 + x 12 + x 13 + x 16 + x 18 .
One can check that the generator polynomial for this code is
g23 (x) = 1 + x + x 5 + x 6 + x 7 + x 9 + x 11 .
Label the coordinates of G 23 as {0, 1, 2, . . . , 22} in the usual numbering for cyclic codes. To
obtain G 24 , add an overall parity check in coordinate labeled ∞. In this form, we can describe
ŴAut(G 24 ) = PAut(G 24 ). PAut(G 24 ) contains the subgroup of cyclic shifts, generated by
s24 = (0, 1, 2, . . . , 22).
PAut(G 24 ) also contains a group of multipliers generated by µ2 by Theorem 4.3.13;
402
Some favorite self-dual codes
in cycle form
µ2 = (1, 2, 4, 8, 16, 9, 18, 13, 3, 6, 12)(5, 10, 20, 17, 11, 22, 21, 19, 15, 7, 14).
As G 24 is an extended quadratic residue code, the Gleason–Prange Theorem applies; the
automorphism arising from this theorem is
ν24 = (0, ∞)(1, 22)(2, 11)(3, 15)(4, 17)(5, 9)(6, 19)(7, 13)(8, 20)(10, 16)(12, 21)(14, 18).
The permutations s24 , µ2 , and ν24 generate the projective special linear group denoted
PSL2 (23). However, this is only a small subgroup of the entire group. The element
η24 = (1, 3, 13, 8, 6)(4, 9, 18, 12, 16)(7, 17, 10, 11, 22)(14, 19, 21, 20, 15)
is also in PAut(G 24 ); see [149, Theorem 6.7] for one proof of this. The elements s24 , µ2 ,
ν24 , and η24 generate PAut(G 24 ). The resulting group is the 5-fold transitive Mathieu group,
often denoted M24 , of order 244 823 040 = 24 · 23 · 22 · 21 · 20 · 48. M24 is the largest of
the five sporadic simple groups known as Mathieu groups. The subgroup of M24 fixing the
point ∞ is PAut(G 23 ). This is the 4-fold transitive Mathieu group M23 , which is a simple
group of order 10 200 960 = 23 · 22 · 21 · 20 · 48.
Exercise 574 Verify that g23 (x) is the generator polynomial for the cyclic code with
generating idempotent e23 (x).
Exercise 575 Verify that η24 ∈ PAut(G 24 ). One way to do this is to recognize that G 24 is
self-dual. To show that η24 ∈ PAut(G 24 ), it suffices to show that the image of a basis (such
as one obtained by using cyclic shifts of g23 (x) and adding a parity check) under η24 gives
vectors orthogonal to this basis.
We also know the covering radius of each Golay code.
Theorem 10.1.7 The covering radii of the [23, 12, 7] and [24, 12, 8] binary Golay codes
are 3 and 4, respectively.
Proof: As the [23, 12, 7] code is perfect by Lemma 10.1.1, its covering radius is 3. The
[24, 12, 8] extended binary Golay code has covering radius 4 by Example 8.3.2.
10.2
Permutation decoding
Permutation decoding is a technique, developed in a 1964 paper by F. J. MacWilliams
[214], which uses a fixed set of automorphisms of the code to assist in decoding a received
vector. The idea of permutation decoding is to move all the errors in a received vector out
403
10.2 Permutation decoding
of the information positions, using an automorphism of the code, so that the information
symbols in the permuted vector are correct. Since the components of the permuted vector in the information positions are correct, apply the parity check equations to the now
correct information symbols thus obtaining a vector that must be the correct codeword,
only permuted. Finally, apply the inverse of the automorphism to this new codeword and
obtain the codeword originally transmitted. This technique has been successfully used as a
decoding method for the binary Golay codes.
To use permutation decoding on a t-error-correcting code we must settle two questions.
First, how can you guarantee that all of the errors are moved out of the information positions?
And second, how do you find a subset S of the automorphism group of the code that moves
the nonzero entries in every possible error vector of weight t or less out of the information
positions? Such a set is called a PD-set. The first question has a simple answer using the
syndrome, as seen in the following theorem from [218]. The second question is not so easy
to answer.
Theorem 10.2.1 Let C be an [n, k] t-error-correcting linear code with parity check matrix
H having the identity matrix In−k in the redundancy positions. Suppose y = c + e is a
vector with c ∈ C and wt(e) ≤ t. Then the information symbols in y are correct if and only
if wt(syn(y)) ≤ t, where syn(y) = H yT .
Proof: As applying a permutation to the coordinates of C does not change weights of
codewords, we may assume using Theorem 1.6.2 that the generator matrix G of C is
in standard form [Ik A] and that H = [−AT In−k ]. Hence the first k coordinates are
an information set. If the information symbols in y are correct, then H yT = H eT = eT
as the first k coordinates of e are 0. Thus wt(syn(y)) ≤ t. Conversely, suppose that
some of the information symbols in y are incorrect. If e = e1 · · · en , let e1 = e1 · · · ek ,
which is nonzero, and e2 = ek+1 · · · en . By Exercise 576, wt(x + z) ≥ wt(x) − wt(z); so
we have
wt(H yT ) = wt(H eT ) = wt − AT eT1 + eT2
≥ wt −AT eT1 − wt eT2 = wt (e1 A) − wt(e2 )
= wt(e1 Ik ) + wt(e1 A) − [wt(e1 ) + wt(e2 )] = wt(e1 G) − wt(e).
As e1 = 0, e1 G is a nonzero codeword and hence has weight at least 2t + 1. Thus wt(H yT ) ≥
2t + 1 − t = t + 1.
Exercise 576 Prove that if x and z are in Fqn , then wt(x + z) ≥ wt(x) − wt(z).
Using this theorem, the Permutation Decoding Algorithm can be stated. Begin with an
[n, k] t-error-correcting code C having a parity check matrix H that has In−k in the redundancy positions. So the generator matrix G associated with H has Ik in the k information
positions; the map u → uG, defined for any length k message u, is a systematic encoder
for C. Let S = {σ1 , . . . , σ P } be a PD-set for C. The Permutation Decoding Algorithm is
accomplished using the following algorithm:
404
Some favorite self-dual codes
For a received vector y, compute wt(H (yσi )T ) for i = 1, 2, . . . until an i is found so
that this weight is t or less.
II. Extract the information symbols from yσi , and apply the parity check equations to the
string of information symbols to obtain the redundancy symbols. Form a codeword c
from these information and redundancy symbols.
III. y is decoded to the codeword cσi−1 .
We now turn to the question of finding PD-sets. Clearly, the Permutation Decoding
Algorithm is more efficient the smaller the size of the PD-set. In [107] a lower bound on
this size is given. We will see shortly that a PD-set for the [24, 12, 8] extended binary Golay
code exists whose size equals this lower bound. Suppose S = {σ1 , . . . , σ P } is a PD-set
for a t-error-correcting [n, k] code with redundancy locations R of size r = n − k. Let
Ri be the image of R under the permutation part of σi−1 for 1 ≤ i ≤ P. Because S is a
PD-set, every subset of coordinates of size e ≤ t is contained in some Ri . In particular,
the sets R1 , . . . , R P cover all t-element subsets of the n-element set of coordinates. Let
N (t, r, n) be the minimum number of r -element subsets of an n-element set " that cover all
t-element subsets of ". Our discussion shows that P ≥ N (t, r, n). If R1 , . . . ,R N with N =
N (t, r, n) is a collection of r -element subsets of " covering the t-element subsets, then form
the set
I.
X = {(Ri , ω) | 1 ≤ i ≤ N and ω ∈ Ri }.
Clearly, as |Ri | = r , |X | = r N (t, r, n). We compute a lower bound on |X | in another manner. If ω ∈ ", let X ω = {(Ri , ω) ∈ X }. Notice that |X ω | ≥ N (t − 1, r − 1, n − 1) because
the sets obtained from Ri , where (Ri , ω) ∈ X ω , by deleting ω produce a collection of
(r − 1)-element sets that cover all (t − 1)-element subsets of " \ {ω}. Hence there are at
least n N (t − 1, r − 1, n − 1) pairs in X , and so
N (t, r, n) ≥
n
N (t − 1, r − 1, n − 1).
r
By repeated application of this inequality together with the observation that N (1, r, n) =
⌈n/r ⌉ we obtain the following.
Theorem 10.2.2 ([107]) A PD-set of size P for a t-error-correcting [n, k] code with
redundancy r = n − k satisfies
P≥
"
#
##
" "
n−t +1
n n−1
···
···
.
r r −1
r −t +1
Example 10.2.3 By Theorem 10.2.2 we find that a PD-set for the [24, 12, 8] extended
binary Golay code has size at least 14. The permutations that follow form the PD-set found
in [107] where the code is written as an extended quadratic residue code; here the generator
polynomial of the (nonextended) cyclic quadratic residue code is 1 + x 2 + x 4 + x 5 + x 6 +
x 10 + x 11 , and the extended coordinate is labeled ∞.
405
10.3 The hexacode
(1)
(0, 14)(1, 11)(2, 19)(3, 17)(4, 16)(5, 15)(6, 12)(7, 18)(8, 20)(9, 21)(10, 13)(22, ∞)
(0, 16, 20, 13, 7, 4, 8)(1, 22, 11, 9, 18, ∞, 21, 12, 6, 17, 5, 15, 14, 19)(2, 10)
(0, 4, 15, 3, 14, 1, 9)(2, 17, 21, 6, 10, 11, 22)(5, 8, 18, 20, 7, 16, ∞)
(2, 20, 16, 21, 13, 6, 4, 12)(3, 19, 5, 8, 15, 14, 18, 9)(7, 17)(10, 22, 11, ∞)
(0, 11, 16, 6, 18, 19, 20, 9, 13, 22, 5, 17, 12, 14, 10, 8, 2, 7, 1, 21, ∞, 15, 4)
(0, 6, 20, 14, 8, 9, 13, 11, 17, 2, 19)(1, 5, 21, 12, 16, 3, 15, ∞, 4, 18, 10)
(0, 20, 4, 7, 12, 5, 6)(1, 15, 16, 17, 18, 21, 3)(2, 10, 22, ∞, 13, 8, 11)
(0, 6, 16, 1, 9, 17, 19, 11, 8, 22, 14, ∞, 5, 15, 2)(4, 10, 20, 21, 12)(7, 18, 13)
(0, 16, 4, 20, 19, 17, 18, 15, 3, 22, 11, 9)(1, 14, 10, ∞, 21, 12, 6, 7, 8, 5, 2, 13)
(0, 5, 11, 2, 21, 3, 12)(1, 8, 10, 6, 18, 22, 9, 13, 4, 14, 15, 20, 7, ∞)(16, 17)
(0, 15, 1, 12, 20, 19, 3, 9, 2, 10, 18, ∞, 21, 17)(4, 8, 16, 6, 7, 11, 14)(13, 22)
(0, 4, 5, 11, 15, 13, 19)(1, 7, 22, 2, 3, 8, 20)(6, 14, 18, 9, 17, 10, 21)
(0, 13, 18, 1, 21, 7, 6, ∞, 11, 16, 15, 17, 8, 4, 12, 20, 9, 3, 19, 10, 2, 14, 22)
These permutations were found by computer using orbits of M24 on certain sets.
10.3
The hexacode
The hexacode, first presented in Example 1.3.4, is the unique [6, 3, 4] Hermitian self-dual
code over F4 . This MDS code possesses many of the same properties as the Golay codes
including being extended perfect and having a representation as an extended quadratic
residue code.
10.3.1 Uniqueness of the hexacode
Let C be a [6, 3, 4] code over F4 with generator matrix G, which we may assume is in
standard form [I3 A] where A is a 3 × 3 matrix. As the minimum weight is 4, every entry
in A is nonzero. By scaling the columns, we may assume that the first row of A is 111. By
rescaling columns 2 and 3 of I3 , we may also assume that each entry in the first column
of A is 1. If two entries in row two of A are equal to α, then α times row one of G added
to row two of G gives a codeword of weight less than 4, which is a contradiction. Hence
row two, and analogously row three, each have distinct entries. As rows two and three of A
cannot be equal to each other, we may assume that
1 1 1
(10.3)
A = 1 ω ω .
1 ω ω
It is easy to see that G generates a Hermitian self-dual code and hence is equivalent to
the hexacode. Notice that G is a bordered double circulant generator matrix (and bordered
reverse generator matrix) for C. This proves the following result.
Theorem 10.3.1 Let C be a [6, 3, 4] code over F4 . Then C is unique and is equivalent to
the hexacode.
406
Some favorite self-dual codes
Exercise 577 Give an equivalence that maps the generator matrix for the hexacode as
given in Example 1.3.4 to the generator matrix [I3 A], where A is given in (10.3).
10.3.2 Properties of the hexacode
The extended binary and ternary Golay codes of lengths 24 and 12, respectively, were each
extended quadratic residue codes. The same result holds for the hexacode by Exercise 363.
In addition, the hexacode is an MDS code and by Exercise 578, the [5, 3, 3] code obtained
by puncturing the hexacode is a perfect code.
Exercise 578 Do the following:
(a) Show that a [5, 3, 3] code over F4 is perfect.
(b) Show that a [5, 3, 3] code over F4 is unique and that its extension is the hexacode.
(c) Show that the hexacode is an extended Hamming code over F4 .
By Theorem 6.6.6 (see Example 6.6.8 and Exercise 363), one [5, 3, 3] odd-like quadratic
residue code over F4 has generating idempotent
e5 (x) = 1 + ωx + ωx 2 + ωx 3 + ωx 4 ;
the associated generator polynomial is
g5 (x) = 1 + ωx + x 2 .
Extending this code we obtain the hexacode G 6 . Labeling the coordinates 0, 1, 2, 3, 4, ∞,
the permutation
s6 = (0, 1, 2, 3, 4)
is in ŴAut(G 6 ). Furthermore, the map
η6 = diag(1, ω, ω, ω, ω, 1) × (0, ∞) × σ2 ,
where σ2 is the Frobenius map interchanging ω and ω, is in ŴAut(G 6 ). These two elements
generate ŴAut(G 6 ), which is a group of order 2160. The group ŴAutPr (G 6 ) = {P | D P ∈
6 )} (the projection of ŴAut(G 6 ) onto its permutation parts defined in Section 1.7)
MAut(G
is the symmetric group Sym6 .
Exercise 579 Verify that g5 (x) is the generator polynomial for the cyclic code with gen
erating idempotent e5 (x).
Exercise 580 Verify that η6 is in ŴAut(G 6 ).
The covering radius of the punctured hexacode is 1 as it is a perfect [5, 3, 3] code.
Hence by Theorem 1.12.6, the hexacode has covering radius at most 2. In fact we have the
following.
Theorem 10.3.2 The covering radius of the hexacode is 2.
Exercise 581 Prove Theorem 10.3.2.
407
10.3 The hexacode
10.3.3 Decoding the Golay code with the hexacode
In Section 10.2, we presented a method to decode the [24, 12, 8] extended binary Golay
code. To be used effectively, this decoding must be done by computer. Here we present
another method for decoding, found in [268], which can certainly be programmed on a
computer but can also be done by hand. We use a version of the [24, 12, 8] extended binary
Golay code defined in terms of the hexacode.
We first establish some notation. Let v = v1 v2 · · · v24 ∈ F24
2 . We rearrange v into a 4 × 6
array
v1 v5 · · · v21
v2 v6 · · · v22
(10.4)
v3 v7 · · · v23 ,
v4 v8 · · · v24
which we denote as [v]. The parity of a column of [v] is even or odd if the binary sum of the
entries in that column is 0 or 1, respectively. There is an analogous definition for the parity
of a row. The set of vectors written as 4 × 6 matrices that satisfy two criteria will form the
[24, 12, 8] extended binary Golay code. The second criterion involves the hexacode. The version of the hexacode G 6 used throughout the remainder of this section has generator matrix
1 0 0 1 ω ω
G 6 = 0 1 0 1 ω ω ,
0 0 1 1 1 1
which is obtained from the generator matrix [I3 A], where A is given in (10.3), by
interchanging the first and third columns of [I3 A]. This form is used because it is easy to
identify the 64 codewords in the hexacode. To use the hexacode we send an element of F24
2
into an element of F64 by defining the projection Pr(v) of v into F64 as the matrix product
Pr(v) = [0 1 ω ω][v].
Example 10.3.3 Suppose that v = 111111000000001101010110. The projection operation can be visualized by writing
0 1
1
1
ω 1
ω 1
0
1
1
0
0
0
0
0
0
0
0
1
1
0
1
0
1
1
0
1 ω
0
1
,
1
0
ω
where the projection is accomplished by multiplying the first row of [v] by 0, multiplying
the second row by 1, multiplying the third row by ω, multiplying the fourth row by ω, and
then adding.
Recall that the complement of a binary vector v is the binary vector v + 1 obtained by
replacing the 0s in v by 1s and the 1s by 0s. We will repeatedly use the following lemma
without reference; its proof is left as an exercise.
408
Some favorite self-dual codes
Lemma 10.3.4 The following hold for columns of (10.4):
(i) Given α ∈ F4 , there are exactly four possible choices for a column that projects to α.
Exactly two choices have even parity and two have odd parity. The two of even parity
are complements of each other, as are the two of odd parity.
(ii) The columns projecting to 0 ∈ F4 are [0000]T , [1111]T , [1000]T , and [0111]T .
Exercise 582 Prove Lemma 10.3.4.
Lemma 10.3.5 Let C be the binary code consisting of all vectors c such that:
(i) all columns of [c] have the same parity and that parity equals the parity of the first row
of [c], and
(ii) the vector Pr(c) ∈ G 6 .
The code C is the [24, 12, 8] extended binary Golay code.
Proof: If we fill in the first three rows of the matrix (10.4) arbitrarily, then the last row
is uniquely determined if (i) is to be satisfied. The set of vectors satisfying (i) is clearly a
linear space. Hence the set of vectors C 1 satisfying (i) is a [24, 18] code. Similarly, if we fill
in the first three columns of the matrix (10.4) arbitrarily, then these columns project onto
three arbitrary elements of F4 . Since the first three coordinates of the hexacode are the three
information positions, the last three coordinates of the projection are uniquely determined
if (ii) is to be satisfied. But there are precisely four choices for a column if it is to project to
some fixed element of F4 . Thus the set of vectors satisfying (ii) has size 212 43 = 218 . This
set C 2 is clearly a linear space as Pr is a linear map and G 6 is closed under addition. Thus
C 2 is also a [24, 18] code. Therefore C = C 1 ∩ C 2 has dimension at least 12.
By Theorem 10.1.6, the proof is completed if we show that C has minimum distance 8.
The vector 1111111100 · · · 0 satisfies (i) and projects onto the zero vector. Hence C has
minimum distance at most 8. Suppose c ∈ C is a nonzero codeword. There are two cases:
the top row of [c] has even parity or the top row has odd parity.
Suppose that the top row of [c] has even parity. If Pr(c) = 0, then as G 6 has minimum
weight 4, there are at least four nonzero columns in [c] all with a (nonzero) even number
of 1s. Therefore wt(c) ≥ 8. Suppose now that Pr(c) = 0. The only column vectors of even
parity that project to 0 ∈ F4 are the all-zero column and the all-one column. Since the top
row of [c] has even parity, [c] must contain an even number of all-one columns. As c = 0,
we again have wt(c) ≥ 8.
Suppose now that the top row of [c] has odd parity. Thus every column of [c] has an odd
number of 1s and so there are at least six 1s in [c]; we are done if one column has three 1s.
So assume that every column of [c] has exactly one 1. In particular, the top row of [c] has
a 1 in a given column precisely when that column projects to 0 ∈ F4 . Thus the number of
1s in the top row is 6 − wt(Pr(c)). However, the codewords of G 6 have even weight and so
the top row of [c] has even parity, which is a contradiction.
For the remainder of this section the code C in Lemma 10.3.5 will be denoted G 24 . In
order to decode G 24 by hand, we need to be able to recognize the codewords in G 6 . If we
label the coordinates by 1, 2, 3, 4, 5, 6, then the coordinates break into pairs (1, 2), (3, 4),
and (5, 6). There are permutation automorphisms that permute these three pairs in any
manner and switch the order within any two pairs; see Exercise 583. We list the codewords
409
Y
L
10.3 The hexacode
F
Table 10.1 Nonzero codewords in the hexacode
T
m
a
e
Group number
Hexacode codeword Number in group
I
II
III
IV
01
ωω
00
11
01
ωω
11
ωω
ωω
ωω
11
ωω
3×12 = 36
3× 4 = 12
3× 3 = 9
3× 2 = 6
in G 6 in Table 10.1 in groups, telling how many are in each group. Each codeword in the
table is separated into pairs. Additional codewords in the group are obtained by multiplying
by ω or ω, permuting the pairs in any order, switching the order in any two pairs, and
by any combination of these. In the table, the column “Number in group” tells how many
codewords can be obtained by these operations. For example the group containing 00 11 11
also contains 11 00 11 and 11 11 00 together with all multiples by ω and ω.
Exercise 583 Do the following:
(a) Show that the permutations p1 = (1, 3)(2, 4), p2 = (1, 3, 5)(2, 4, 6), and p3 =
(1, 2)(3, 4) are in PAut(G 6 ).
(b) Find a product of p1 , p2 , and p3 that:
(i) interchanges the pairs (3, 4) and (5, 6),
(ii) sends (3, 4) to (4, 3) and simultaneously (5, 6) to (6, 5).
Exercise 584 Show that the number of codewords in each group of Table 10.1 is
correct.
Exercise 585 Decide if the following vectors are in the hexacode G 6 :
(a) 10ωω01,
(b) ωω0011,
(c) 0ωω0ω1,
(d) 1ωω1ω1.
We now demonstrate how to decode a received vector y = c + e where c ∈ G 24 and
wt(e) ≤ 3. The decoding falls naturally into four cases depending on the parities of the
columns of [y]. The received vector uniquely determines which case we are in. In particular,
if the received vector has odd weight, then one or three errors have occurred and we are
in Case 1 or Case 3 below. (Note that the case number is not the number of errors.) If
the received vector has even weight, then zero or two errors have occurred and we are in
Case 2 or Case 4. If we cannot correct the received vector by following the procedure in
the appropriate case, then more than three errors have been made. Because the covering
radius of G 24 is 4, if more than four errors are made, the nearest neighbor is not the correct
codeword and the procedure will fail to produce c. If four errors have been made, then Case
2 or Case 4 will arise; with more work, we can find six codewords within distance 4 of the
received vector using these same techniques as in Cases 2 or 4. We will not describe the
procedure for handling four errors; the interested reader can consult [268]. Please keep in
mind that we are assuming at most three errors have been made.
410
Some favorite self-dual codes
Case 1: Three columns of [y] have even parity and three have odd parity
In this case, we know that three errors must have occurred and either there is one error
in each odd parity column or there is one error in each even parity column. We look at
each possibility separately. With each choice, we know three components of Pr(y) that are
correct and hence there is only one way to fill in the coordinates in error to obtain a codeword
in G 6 .
Example 10.3.6 Suppose y = 101000011100001000011001. Then we obtain
0 1
1
0
ω
Pr(y) = 1
ω 0
ω
0
0
0
1
1
1
0
0
0
0
1
0
0
0
0
1
ω
1 ω
ω
1
0
0 .
1
ω
Notice that columns 1, 3, and 6 have even parity and columns 2, 4, and 5 have odd parity.
Suppose the columns with odd parity are correct. Then Pr(c) = ⋆ω⋆ωω⋆, where we must
fill in the ⋆s. The only possibility for Pr(c) is from group I or II in Table 10.1. If it is in
group I, it must be 0ω⋆ωω0, which cannot be completed. So Pr(c) = ωωωωωω. Since we
make only one change in each of columns 1, 3, and 6, we obtain
0 0
1
0
ω1
ω 0
ω
0
0
0
1
1
1
1
0
0
0
1
0
0
0
0
1
ω
ω
ω
ω
1
1
0 .
1
ω
However, the columns have odd parity while the first row has even parity, which is a
contradiction. So the columns with even parity are correct and Pr(c) = ω⋆1⋆⋆ω. The only
possibility for Pr(c) is from group IV; hence Pr(c) = ωω11ωω. Since we make only one
change in each of columns 2, 4, and 5, we obtain:
0 1
1
0
ω1
ω 0
ω
0
1
0
1
1
1
0
0
0
0
1
1
1
0
0
1
ω
1
1
ω
1
0
0 .
1
ω
As all columns and the first row have even parity, this is the codeword.
Case 2: Exactly four columns of [y] have the same parity
As we are assuming that at most three errors have been made, we know that the four
columns with the same parity are correct and the other two columns each contain exactly one
error.
411
10.3 The hexacode
Example 10.3.7 Suppose y = 100011110100000000101000. Then:
0 1
1
0
ω
Pr(y) = 0
ω 0
0
1
1
1
1
0
1
0
0
0
0
0
0
0
0
1
0
0
1
0 ω
1
0
0 .
0
0
Thus columns 1, 3, 5, and 6 have odd parity and 2 and 4 have even parity. So Pr(c) = 0⋆1⋆ω0.
This must be in group I; hence Pr(c) = 0ω1ωω0. We must make one change in each of
columns 2 and 4 so that the top row has odd parity. This gives
0 1
1
0
ω0
ω 0
1
1
0
1
0
1
0
0
0
0
0
1
0
0
1
0
0 ω
1
ω
ω
1
0
0 .
0
0
Since the columns and first row all have odd parity, this is the codeword.
Case 3: Exactly five columns of [y] have the same parity
As at most three errors have been made, either the five columns with the same parity are
correct and the other column has either one or three errors, or four of the five columns are
correct with two errors in the fifth and one error in the remaining column. It is easy using
Table 10.1 to determine if all five columns of the same parity are correct and then to correct
the remaining column. If four of the five columns are correct, a bit more analysis is required.
Example 10.3.8 Suppose y = 110010001001000010010110. Then:
0
1
Pr(y) = ω
ω
1
1
0
0
1
0
0
0
1
0
0
1
0
0
0
0
1
0
0
1
1
0
ω
0 ω
0
1
1 .
0
ω
Column 2 is the only column of odd parity, and so one or three errors have been made in
this column. If all but column 2 of Pr(y) is correct, it is easy to see that no change in column
2 will yield a hexacode codeword from Table 10.1. So four of columns 1, 3, 4, 5, and 6
are correct. If columns 5 and 6 are both correct, then Pr(c) must be in either group III or
IV in which case two of columns 1, 3, and 4 must be in error; this is an impossibility. In
particular, columns 1, 3, and 4 are correct, and Pr(c) = 1⋆ ω0 ⋆ ⋆, which must be in group I.
412
Some favorite self-dual codes
The only possibility is Pr(c) = 1ωω0ω0. Thus the correct codeword must be:
0 1 1 1 0 1 0
1
1 0 0 0 0 0
ω 0 1 0 0 0 0 .
ω 0 0 1 0 1 0
1 ω
ω
0 ω
0
Note that the columns and first row all have even parity.
Case 4: All six columns of [y] have the same parity
This is the easiest case to solve. If the top row has the same parity as the columns and Pr(y)
is in G 6 , no errors were made. Otherwise, as we are assuming that at most three errors have
been made and because the parity of the six columns is correct, any column that is incorrect
must have two errors. Hence at most one column is incorrect. The correct vector Pr(c) is
easy to find and hence the incorrect column, if there is one, is then clear.
Example 10.3.9 Suppose y = 011100101011000111011101. Then:
0 0 0 1 0 1 1
1
1 0 0 0 1 1
.
1
1
1
0
0
0
ω
Pr(y) =
ω 1 0 1 1 1 1
0
ω
1
ω
ω
ω
Clearly, all columns cannot be correct, and it is impossible for both columns 5 and 6 to be
correct. Thus as columns 1, 2, 3, and 4 must be correct, the only possibility is for Pr(c) to
be in group I and Pr(c) = 0ω1ωω0. Thus two entries in column 6 must be changed and the
correct codeword is
0 0 0 1 0 1 1
1
1 0 0 0 1 0
ω 1 1 1 0 0 0 .
ω 1 0 1 1 1 0
0 ω
1
ω
ω
0
Exercise 586 Decode the following received vectors when G 24 is used in transmission or
show that more than three errors have been made:
(a) 001111001111101011000011,
(b) 110011010110111010010100,
(c) 000101001101111010111001,
(d) 101010111100100100100101,
(e) 101110111111010000100110.
The tetracode can be used in a similar manner to decode the [12, 6, 6] extended ternary
Golay code; see [268]. This technique has been generalized in [143].
413
10.4 The ternary Golay codes
10.4
The ternary Golay codes
We now examine in more detail the [11, 6, 5] and [12, 6, 6] ternary Golay codes we first
constructed in Section 1.9.2. Our original construction of the [12, 6, 6] code used a bordered
double circulant generator matrix. As in the binary case, the codes are unique and have
relatively large automorphism groups.
10.4.1 Uniqueness of the ternary Golay codes
The uniqueness of the ternary Golay codes was proved in [65, 260, 269]; our proof follows
these partially and is similar to the binary case. We first prove the uniqueness of the extended
ternary Golay code through a series of exercises.
Exercise 587 Let C be a (possibly nonlinear) ternary code of length 11 and minimum
distance 5 containing M ≥ 36 codewords. Show that C has exactly 36 codewords and is
perfect. Hint: Use the Sphere Packing Bound.
Exercise 588 Let C be a (possibly nonlinear) ternary code of length 11 and minimum
distance 5 containing M ≥ 36 codewords including the 0 codeword. By Exercise 587, C
contains 36 codewords and is perfect. Show that the weight enumerator of C is
WC (x, y) = y 11 + 132x 5 y 6 + 132x 6 y 5 + 330x 8 y 3 + 110x 9 y 2 + 24x 11 .
Hint: Mimic the proof of Lemma 10.1.2. Let N (w, c) denote the number of vectors in F11
3 of
weight w in a sphere of radius 2 centered at a codeword of weight c. The equation analogous
to (10.2) is
11
11 w
N (w, c)Ac =
2 .
w
c=0
The computations of N (w, c) are more complicated than in the proof of Lemma 10.1.2.
Exercise 589 Let C ′ be a (possibly nonlinear) ternary code of length 12 and minimum
distance 6 containing M ≥ 36 codewords including the 0 codeword. Show that C ′ has
exactly 36 codewords and weight enumerator
WC ′ (x, y) = y 12 + 264x 6 y 6 + 440x 9 y 3 + 24x 12 .
Hint: Puncturing C ′ in any coordinate yields a code described in Exercises 587 and 588.
Exercise 590 Show that if c and c′ are vectors in Fn3 such that wt(c), wt(c′ ), and wt(c − c′ )
are all 0 modulo 3, then c and c′ are orthogonal to each other. Hint: You may assume
the nonzero entries of c all equal 1 as rescaling coordinates does not affect weight or
orthogonality.
414
Some favorite self-dual codes
Exercise 591 Let D be a (possibly nonlinear) ternary code of length 12 and minimum
distance 6 containing M ≥ 36 codewords including the 0 codeword. Show that D is a
[12, 6, 6] self-orthogonal linear code. Hint: If c ∈ D, then D and c − D satisfy the conditions
of C ′ given in Exercise 589. Use Exercise 590.
Exercise 592 Let C be a [12, 6, 6] ternary code. You will show that up to equivalence C
has a unique generator matrix. Recall that given a generator matrix G for C, an equivalent
code is produced from the generator matrix obtained from G by permuting columns and/or
multiplying columns by ±1. You can also rearrange or rescale rows and obtain the same
code. Without loss of generality we may assume that G is in standard form
,
G = I6
A ,
where A is a 6 × 6 matrix, whose ith row is denoted ai . We label the columns of G by
1, 2, . . . , 12.
(a) Prove that for 1 ≤ i ≤ 6, ai has at most one component equal to 0.
(b) Prove that it is not possible for two different rows ai and a j of A to have components
that agree in three or more coordinate positions.
(c) Prove that it is not possible for two different rows ai and a j of A to have components
that are negatives of each other in three or more coordinate positions.
(d) Prove that every row of A has weight 5.
(e) Prove that no two rows of A have 0 in the same coordinate position.
(f ) Prove that we may assume that
0
1
1
A=
1
1
1
1
1
1
1
1
,
where the blank entries in each row are to be filled in with one 0, two 1s, and two −1s.
Hint: You may have to rescale columns 2, . . . , 6 and rows 2, . . . , 6 of G to obtain the
desired column 7 of G.
(g) Prove that we may assume that
0
1
1
A=
1
1
1
1 1
1
1 1
0 1 −1 −1 1
0
,
0
0
0
where the blank entries in each row are to be filled in with two 1s and two −1s.
(h) Show that, up to equivalence, a3 = (1, 1, 0, 1, −1, −1).
415
10.4 The ternary Golay codes
(i) Prove that a4 , a5 , and a6 are uniquely determined and that
0
1
1
1
1
1
1
0
1 −1 −1
1
1
0
1 −1 −1
1
A=
.
1 −1
1
0
1 −1
1 −1 −1
1
0
1
1
1 −1 −1
1
0
The generator matrix produced above is precisely the one given in Section 1.9.2 for
the extended ternary Golay code which we showed in that section is indeed a [12, 6, 6]
code.
Exercises 587–592 show that a, possibly nonlinear, ternary code of length 12 containing
0 with at least 36 codewords and minimum distance 6 is, up to equivalence, the [12, 6, 6]
extended ternary Golay code.
We now turn to a ternary code of length 11. Let C be a possibly nonlinear ternary
code of length 11 and minimum distance 5 containing M ≥ 36 codewords including the
0 codeword. We will show that this code is an [11, 6, 5] linear code unique up to equivalence. By Exercises 587 and 588, C is a perfect code with 36 codewords including codewords of weight 11. By rescaling coordinates, we may assume that C contains the all-one
codeword 1.
Note that if c ∈ C, then c − C is a code of length 11 and minimum distance 5 containing 36
codewords including the 0 codeword. Thus its weight distribution is given in Exercise 588,
implying that
d(c, c′ ) ∈ {5, 6, 8, 9, 11}
(10.5)
for distinct codewords c and c′ in C.
We wish to show that −1 is in C. To do this we establish some notation. A vector in F11
3
has “type” (−1)a 1b 0c if it has a −1s, b 1s, and c 0s as components. Suppose that x ∈ C. As
C is perfect, there is a unique codeword c at distance 2 or less from x; we say that c “covers”
x. Assume −1 ∈ C. Let xi be the vector in F11
3 with 1 in coordinate i and −1 in the other
ten coordinates. If some xi is in C, then the distance between xi and 1 is 10, contradicting
(10.5). Let ci cover xi . In particular wt(ci ) is 9 or 11.
First assume that wt(ci ) = 11 for some i. No codeword can have exactly one 1 as its
distance to 1 is 10, contradicting (10.5). Thus ci must have at least two 1s. To cover xi , the
only choices for ci are type (−1)9 12 or (−1)8 13 . In either case, a 1 is in coordinate i. Now
assume that wt(ci ) = 9 for some i. Then ci must have type (−1)9 02 with a 0 in coordinate
i. Consider the possibilities for a ci and c j . If they are unequal, their distance apart must be
at least 5. This implies that both cannot have type (−1)9 a 2 and (−1)9 b2 where a and b are
not −1. Also both cannot have type (−1)8 13 with a 1 in a common coordinate. In particular
it is impossible for all ci to have type (−1)8 13 since 11 is not a multiple of 3. Exercise 593
shows that by rearranging coordinates, the ci s are given in Figure 10.2.
Exercise 593 Fill in the details verifying that c1 , . . . , c11 are given in Figure 10.2.
416
Some favorite self-dual codes
1
2 3
4
5 6
7
8 9
c1 = c2 a
a −1 −1 −1 −1 −1 −1 −1
1
1 −1 −1 −1 −1
c3 = c4 = c5 −1 −1 1
1
1 −1
c6 = c7 = c8 −1 −1 −1 −1 −1 1
c9 = c10 = c11 −1 −1 −1 −1 −1 −1 −1 −1 1
10 11
−1 −1
−1 −1
−1 −1
1
1
Figure 10.2 Configuration for c1 , . . . , c11 , where a = 0 or a = 1.
Now let x have type (−1)9 12 with 1s in coordinates 1 and 3. Since d(x, c1 ) ≤ 3, x ∈ C.
Suppose that c covers x. Again c has weight 9 or 11. If wt(c) = 9, then it has type (−1)9 02
with 0s in coordinates 1 and 3. Then d(c, c1 ) ≤ 3, which is a contradiction. So wt(c) = 11.
If c has type (−1)a 1b , b ≤ 4 since c covers x. If b = 1 or b = 4, then d(c, 1) is 10 or 7, a
contradiction. If b = 2, then d(c, c1 ) ≤ 4, a contradiction. So b = 3 and c must have 1s in
both coordinates 1 and 3. Again d(c, c1 ) ≤ 4, a contradiction. Therefore −1 ∈ C.
So C has codewords 0, 1, and −1. Furthermore, if c ∈ C with wt(c) = 11, −c ∈ C as
follows. Choose a diagonal matrix D so that cD = 1. Then C D has minimum distance 5
and codewords 0 and 1 = cD. By the preceding argument, −1 ∈ C D implying that −cD
∈ C D. So −c ∈ C.
In summary, we know the following. C is a perfect (11, 36 ) code with minimum distance
5 containing 0 and 1. Furthermore, the negative of every weight 11 codeword of C is also
in C. We next show that the same is true of all weight 9 codewords. Exercise 594 gives the
possible types of codewords in C.
Exercise 594 Show that codewords of the given weight have only the listed types:
(a) weight 5: 15 06 , (−1)5 06 , (−1)2 13 06 , (−1)3 12 06 ,
(b) weight 6: (−1)3 13 06 , (−1)6 06 , 16 06 ,
(c) weight 8: (−1)2 16 03 , (−1)3 15 03 , (−1)5 13 03 , (−1)6 12 03 ,
(d) weight 9: (−1)3 16 02 , (−1)6 13 02 ,
(e) weight 11: (−1)11 , 111 , (−1)5 16 , (−1)6 15 .
Hint: Consider each possible type for a given weight and compute its distance to each of 1
and −1. Use (10.5).
To show the negative of a weight 9 codeword c is also a codeword, rescaling all 11
coordinates by −1 if necessary, we may assume c has type (−1)3 16 02 by Exercise 594. By
permuting coordinates
c = (1, 1, 1, 1, 1, 1, −1, −1, −1, 0, 0).
Assume −c ∈ C. Consider the weight 11 vector
x1 = (−1, −1, −1, −1, −1, −1, 1, 1, 1, 1, 1).
If x1 ∈ C, then −x1 ∈ C and d(−x1 , c) = 2, contradicting (10.5). Suppose c1 covers x1 . Then
wt(c1 ) is 9 or 11. By reordering coordinates and using Exercise 594, the only possibilities
for c1 are
417
10.4 The ternary Golay codes
(−1, −1, −1, −1, −1, 1, −1, 1, 1, 1, 1),
(−1, −1, −1, −1, −1, 1, 1, 1, 1, 1, 1),
(−1, −1, −1, −1, −1, 1, 1, 1, 1, 1, −1),
(−1, −1, −1, −1, −1, −1, 1, 1, 0, 1, 0),
or
(−1, −1, −1, −1, −1, −1, 1, 0, 0, 1, 1).
In the first case, d(−c1 , c) = 4 and in the next three cases, d(c1 , c) = 10, contradicting
(10.5). Only the last one is possible. Let
x2 = (−1, −1, −1, −1, −1, −1, 1, 1, 1, 1, −1),
which is not in C by Exercise 594. Suppose c2 covers x2 . By testing possibilities as above
and after reordering coordinates we may assume
c2 = (−1, −1, −1, −1, −1, 0, 0, 1, 1, 1, −1).
Finally, let
x3 = (−1, −1, −1, −1, −1, −1, 1, 1, 1, −1, 1),
which again is not in C by Exercise 594. No codeword c3 can be constructed covering x3
under the assumption that c1 and c2 are in C. This proves that the negative of a weight 9
codeword c is also a codeword.
Exercise 595 Fill in the details showing that c2 is as claimed and c3 does not exist.
Let D be the subcode of C consisting of the 110 codewords of weight 9. Let x ∈ C and
y ∈ D. We show that x and y are orthogonal. By Exercise 596, since wt(y) ≡ 0 (mod 3),
wt(x) + x · y ≡ d(x, y) (mod 3) and
wt(x) − x · y ≡ d(x, −y) (mod 3).
(10.6)
(10.7)
Since y ∈ D and −y ∈ D, d(x, y) and d(x, −y) can each only be 0 or 2 modulo 3. If
wt(x) ≡ 0 (mod 3) and either d(x, y) or d(x, −y) is 0 modulo 3, then x · y ≡ 0 (mod 3) by
(10.6) or (10.7), respectively. If d(x, y) = d(x, −y) ≡ 2 (mod 3), then x · y ≡ 2 (mod 3) by
(10.6) and x · y ≡ 1 (mod 3) by (10.7), a conflict. If wt(x) ≡ 2 (mod 3) and either d(x, y) or
d(x, −y) is 2 modulo 3, then x · y ≡ 0 (mod 3) by (10.6) or (10.7), respectively. If d(x, y) =
d(x, −y) ≡ 0 (mod 3), then x · y ≡ 1 (mod 3) by (10.6) and x · y ≡ 2 (mod 3) by (10.7), a
conflict. In all cases x is orthogonal to y.
Therefore the linear code D1 spanned by the codewords of D is orthogonal to the linear
code C 1 spanned by C. As D has 110 codewords and 110 > 34 , D1 has dimension at least
5. As C has 36 codewords, C 1 has dimension at least 6. Since the sum of the dimensions
of D1 and C 1 is at most 11, the dimensions of C 1 and D1 are exactly 6 and 5, respectively.
Thus C 1 = C implying C is linear. If we extend C, we obtain a [12, 6] linear code
C. By
Exercise 594 the weight 5 codewords of C extend to weight 6 implying
C is a [12, 6, 6]
linear code. By Exercise 592,
C is the extended ternary Golay code. In the next subsection
we will see that
C has a transitive automorphism group; by Theorem 1.7.13, all 12 codes
obtained by puncturing
C in any of its coordinates are equivalent. Thus C is unique.
418
Some favorite self-dual codes
Exercise 596 Prove that if x and y are in Fn3 , then
d(x, y) ≡ wt(x) + wt(y) + x · y (mod 3).
Hint: By rescaling, which does not change distance or inner product modulo 3, you may
assume all nonzero components of x equal 1.
This discussion and the accompanying exercises have proved the following theorem.
Theorem 10.4.1 Both length 11 and length 12, possibly nonlinear, ternary codes each
containing 0 with M ≥ 36 codewords and minimum distance 5 and 6, respectively, are
unique up to equivalence. They are the [11, 6, 5] and [12, 6, 6] ternary Golay codes.
10.4.2 Properties of ternary Golay codes
In Example 6.6.11, we gave generating idempotents for the odd-like ternary quadratic
residue codes of length 11. One such code has generating idempotent
e11 (x) = −(x 2 + x 6 + x 7 + x 8 + x 10 )
and generator polynomial
g11 (x) = −1 + x 2 − x 3 + x 4 + x 5 .
This code has minimum distance d0 ≥ 4 by Theorem 6.6.22. By Theorem 6.6.14, the extended code is a [12, 6] self-dual code with minimum distance a multiple of 3. Therefore
d0 ≥ 5, and by Theorem 10.4.1 this is one representation of the [11, 6, 5] ternary Golay
code G 11 . Extending this odd-like quadratic residue code gives one representation of the
[12, 6, 6] extended ternary Golay code G 12 .
Exercise 597 Verify that g11 (x) is the generator polynomial for the cyclic code with
generating idempotent e11 (x).
Exercise 598 Create the generator matrix for the extended quadratic residue code extending
the cyclic code with generator polynomial g11 (x), adding the parity check coordinate on
the right. Row reduce this generator matrix and show that columns can indeed be permuted
and scaled to obtain the generator matrix in Exercise 592(i).
As in Section 10.1.2, the following permutations are automorphisms of G 12 , viewed as
the extended quadratic residue code obtained by extending the cyclic code with generating
idempotent e11 (x), where the coordinates are labeled 0, 1, . . . , 10, ∞:
s12 = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), and
µ3 = (1, 3, 9, 5, 4)(2, 6, 7, 10, 8).
The element from the Gleason–Prange Theorem is
ν12 = diag(1, 1, −1, 1, 1, 1, −1, −1, −1, 1, −1, −1)
×(0, ∞)(1, 10)(2, 5)(3, 7)(4, 8)(6, 9).
419
10.4 The ternary Golay codes
These three elements along with
η12 = (1, 9)(4, 5)(6, 8)(7, 10)
generate ŴAut(G 12 ) = MAut(G 12 ). This group has order 95040 · 2, and is isomorphic to
B 12 .1 This group has a center of order 2, and modulo this
a group sometimes denoted M
center, the quotient group is the 5-fold transitive Mathieu group M12 of order 95040 =
12 · 11 · 10 · 9 · 8. This quotient group can be obtained by dropping the diagonal part of all
G 12 ) defined in Section 1.7 to be the set
elements of MAut(G 12 ); this is the group MAutPr (
12 )}.
{P | D P ∈ MAut(G
Exercise 599 Verify that ν12 and η12 are in ŴAut(G 12 ). See Exercise 575.
Again we can find the covering radius of each ternary Golay code.
Theorem 10.4.2 The covering radii of the [11, 6, 5] and [12, 6, 6] ternary Golay codes are
2 and 3, respectively.
Exercise 600 Prove Theorem 10.4.2.
The next example illustrates a construction of the extended ternary Golay code from the
projective plane of order 3; it is due to Drápal [73].
Example 10.4.3 The projective plane PG(2, 3) has 13 points and 13 lines with four points
per line and four lines through any point; see Theorem 8.6.2. Label the points 0, 1, 2, . . . , 12.
Denote the four lines through 0 by ℓ1 , . . . , ℓ4 and the remaining lines by ℓ5 , . . . , ℓ13 . Form
vectors vi in F13
3 of weight 4 with supp(vi ) = ℓi whose nonzero components are 1 if 1 ≤ i ≤
4 and 2 if 5 ≤ i ≤ 13. For 1 ≤ i < j ≤ 13, let vi, j = vi − v j and let vi,∗ j be vi, j punctured
on coordinate 0. Let C be the ternary code of length 12 spanned by vi,∗ j for 1 ≤ i < j ≤ 13.
Then C is the [12, 6, 6] extended ternary Golay code as Exercise 601 shows.
Exercise 601 Use the notation of Example 10.4.3.
(a) Show that wt(vi,∗ j ) = 6.
(b) Show that vi,∗ j is orthogonal to v∗m,n for 1 ≤ i < j ≤ 13 and 1 ≤ m < n ≤ 13. Hint:
Consider the inner product of va∗ with v∗b where these vectors are va and vb punctured on
coordinate 0. It will be helpful to examine the cases 1 ≤ a ≤ 4, 1 ≤ b ≤ 4, 5 ≤ a ≤ 13,
and 5 ≤ b ≤ 13 separately.
(c) Show that by labeling the coordinates and lines appropriately we can assume that ℓ1 =
{0, 1, 2, 3}, ℓ2 = {0, 4, 5, 6}, ℓ3 = {0, 7, 8, 9}, ℓ4 = {0, 10, 11, 12}, ℓ5 = {1, 4, 7, 10},
ℓ6 = {1, 5, 8, 11}, ℓ7 = {1, 6, 9, 12}, and ℓ8 = {2, 4, 8, 12}.
(d) With the lines in part (c), show that {v∗1, j | 2 ≤ j ≤ 6, j = 8} are six linearly independent
vectors in C.
(e) Show that C is a [12, 6] self-dual code.
(f ) Show that C has minimum weight 6. Hint: Assume C has minimum weight d < 6. By
Theorem 1.4.10(i), d = 3. By part (e), C is self-dual. Show that there is no weight 3
vector in F12
3 that is orthogonal to all the vectors in part (d).
1
B 12 is technically the double cover of M12 , or the nonsplitting central extension of M12 by a center of
M
order 2.
420
Some favorite self-dual codes
10.5
Symmetry codes
In Section 1.9.2 and Exercise 592 we gave a construction of the [12, 6, 6] extended ternary
Golay code using a double circulant generator matrix. This construction was generalized
by Pless in [263]. The resulting codes are known as the Pless symmetry codes.
The symmetry codes are self-dual ternary codes of length 2q + 2, where q is a power of
an odd prime with q ≡ 2 (mod 3); these codes will be denoted S(q). A generator matrix for
S(q) is [Iq+1 Sq ], where Sq is the (q + 1) × (q + 1) matrix defined by (8.19) in connection
with our construction of Hadamard matrices in Section 8.9. When q is prime, Sq is a circulant
matrix and [Iq+1 Sq ] is a double circulant generator matrix for S(q). Example 8.9.9 gives
S5 . This is the same as the matrix A in Exercise 592 and [I6 S5 ] is the generator matrix G 12
given in Section 1.9.2 for the [12, 6, 6] extended ternary Golay code. Using the properties
of Sq developed in Theorem 8.9.11 we can show that S(q) is a Type III code.
Theorem 10.5.1 Let q be a power of an odd prime with q ≡ 2 (mod 3). Then, modulo 3,
the following matrix equalities hold:
(i) If q ≡ 1 (mod 4), then Sq = SqT .
(ii) If q ≡ −1 (mod 4), then Sq = −SqT .
(iii) Sq SqT = −Iq+1 .
Furthermore, S(q) is a Type III code.
Proof: Parts (i) and (ii) are the same as parts (i) and (ii) of Theorem 8.9.11. Since q ≡
2 (mod 3), part (iii) follows from Theorem 8.9.11(iii) upon reducing modulo 3.
The inner product of a row of Sq with itself is −1 by (iii). Hence the inner product of a
row of [Iq+1 Sq ] with itself is 0. The inner product of one row of Sq with a different row is
0 by (iii). Hence the inner product of one row of [Iq+1 Sq ] with a different row is 0. Thus
[Iq+1 Sq ] generates a self-orthogonal code; as its dimension is clearly q + 1, the code is
Type III.
There is another generator matrix for S(q) that, when used in combination with the
original generator matrix, allows one to reduce the amount of computation required to
calculate the minimum weight. The proof is left as an exercise.
Corollary 10.5.2 Let q be a power of an odd prime with q ≡ 2 (mod 3).
(i) If q ≡ 1 (mod 4), then [−Sq Iq+1 ] is a generator matrix of S(q).
(ii) If q ≡ −1 (mod 4), then [Sq Iq+1 ] is a generator matrix of S(q).
Exercise 602 Prove Corollary 10.5.2.
Example 10.5.3 We give a quick proof that S(5) has minimum weight 6. By Theorem 10.5.1, S(5) is self-dual and by Corollary 10.5.2, [I6 S5 ] and [−S5 I6 ] both generate
S(5). If S(5) has minimum weight 3, a minimum weight codeword has weight 0 or 1 on
either the left-half or right-half of the coordinates. But as the left six and right six coordinates are each information positions, the only codeword that is zero on the left- or right-half
is the zero codeword. The only codewords that have weight 1 on the left- or right-half are
421
10.5 Symmetry codes
scalar multiples of the rows of [I6 S5 ] and [−S5 I6 ], all of which have weight 6. Hence
S(5) has minimum weight 6 since its minimum weight is a multiple of 3. In a similar, but
more complicated, fashion you can prove that the minimum weight of S(11) is 9; if it is not,
then a minimum weight codeword must be a combination of at most three rows of [I12 S11 ]
or [S11 I12 ]. This requires knowing only the weights of combinations of at most three rows
of S11 .
Corollary 10.5.2 leads directly to an automorphism of S(q) that interchanges the coordinates on the left-half with the coordinates on the right-half. The proof is again left as an
exercise.
Corollary 10.5.4 Let q be a power of an odd prime with q ≡ 2 (mod 3), and let Oq+1 be
the (q + 1) × (q + 1) zero matrix.
(i) If q ≡ 1 (mod 4), then
Oq+1 Iq+1
−Iq+1 Oq+1
(10.8)
is a monomial automorphism of S(q).
(ii) If q ≡ −1 (mod 4), then
Oq+1 Iq+1
Iq+1 Oq+1
(10.9)
is a permutation automorphism of S(q).
Exercise 603 Prove Corollary 10.5.4.
The codes S(q) where q = 5, 11, 17, 23, and 29 are all extremal codes, as shown in
[261, 263] using both theoretical and computer techniques. In each case, 2q + 2 is a multiple
of 12 and so by Theorem 9.3.10 codewords of weight k with (q + 1)/2 + 3 ≤ k ≤ q + 4
form 5-(2q + 2, k, λ) designs. Table 10.2 gives the parameters of these designs where “b”
denotes the number of blocks. Notice that the number of codewords of weight k in S(q)
is twice the number of blocks in the design since the support of a codeword of weight k
determines the codeword up to multiplication by ±1; see Exercise 451.
The symmetry code S(5) is the [12, 6, 6] extended ternary Golay code and the latter is
unique. It can be shown that by extending the ternary quadratic residue code of length 23, a
[24, 12, 9] extremal Type III code is obtained. This code is not equivalent to the [24, 12, 9]
symmetry code S(11). (One way to show the two codes are inequivalent is to note that
the extended quadratic residue code has an automorphism group of order 12 144, while
the symmetry code has a subgroup of its automorphism group of order 5280 which is not
a divisor of 12 144.) In [191], it was shown that these two codes are the only extremal
Type III codes of length 24. The only extremal Type III code of length 36 that is known
is the symmetry code S(17); it has been shown in [146] that this is the only extremal
Type III code with an automorphism of prime order 5 or more. This suggests the following
problem.
Research Problem 10.5.5 Find all [36, 18, 12] Type III codes.
422
Some favorite self-dual codes
Table 10.2 Parameters for 5-(v, k, λ) designs from S(q)
q
v
k
λ
b
5
5
12
12
6
9
1
35
132
220
11
11
11
24
24
24
9
12
15
6
576
8 580
2 024
30 912
121 440
17
17
17
17
36
36
36
36
12
15
18
21
45
5 577
209 685
2 438 973
21 420
700 128
9 226 140
45 185 184
23
23
23
23
23
48
48
48
48
48
15
18
21
24
27
364
50 456
2 957 388
71 307 600
749 999 640
207 552
10 083 568
248 854 848
2 872 677 600
15 907 684 672
29
29
29
29
29
29
60
60
60
60
60
60
18
21
24
27
30
33
3 060
449 820
34 337 160
1 271 766 600
24 140 500 956
239 329 029 060
1 950 540
120 728 160
4 412 121 480
86 037 019 040
925 179 540 912
5 507 375 047 020
In [263], a subgroup of ŴAut(S(q)) = MAut(S(q)) has been found. There is evidence that
this subgroup may be the entire automorphism group of S(q) for q > 5, at least when q is a
prime. The monomial automorphism group contains Z = {I2q+2 , −I2q+2 }. The group Z is
the center of MAut(S(q)). If q ≡ 1 (mod 4), the automorphism (10.8) generates a subgroup
J 4 of MAut(S(q)) of order 4 containing Z. If q ≡ −1 (mod 4), the automorphism (10.9)
generates a subgroup J 2 of MAut(S(q)) of order 2 intersecting Z only in the identity
I2q+2 . Finally, MAut(S(q)) contains a subgroup P such that P/Z is isomorphic to the
projective general linear group PGL2 (q), which is a group of order q(q 2 − 1). If q ≡ 1
(mod 4), MAut(S(q)) contains a subgroup generated by P and J 4 of order 4q(q 2 − 1); if
q ≡ −1 (mod 4), MAut(S(q)) contains a subgroup generated by P and J 2 also of order
4q(q 2 − 1).
Research Problem 10.5.6 Find the complete automorphism group of S(q).
10.6
Lattices and self-dual codes
In this section we give a brief introduction to lattices over the real numbers and their
connection to codes, particularly self-dual codes. We will see that the Golay codes lead to
important lattices. We will encounter lattices again in Section 12.5.3. The reader interested
in further connections among codes, lattices, sphere packings, quantization, and groups
should consult either [60] or [76].
423
10.6 Lattices and self-dual codes
We first define a lattice in Rn . Let v1 , . . . , vn be n linearly independent vectors in Rn ,
where vi = vi,1 vi,2 · · · vi,n . The lattice with basis {v1 , . . . , vn } is the set of all integer
combinations of v1 , . . . , vn ; these integer combinations of basis vectors are the points of
the lattice. Thus
= {z 1 v1 + z 2 v2 + · · · + z n vn | z i ∈ Z,1 ≤ i ≤ n}.
The matrix
v1,1
v2,1
M =
vn,1
v1,2
v2,2
···
···
..
.
vn,2
···
v1,n
v2,n
vn,n
is called the generator matrix of ; thus = {zM | z ∈ Zn }. The n × n matrix
A = M MT
is called the Gram matrix for ; its (i, j)-entry is the (ordinary) inner product (over the real
numbers) of vi and v j . The determinant or discriminant of the lattice is
det = det A = (det M)2 .
This determinant has a natural geometric meaning: det is the volume of the fundamental
parallelotope of , which is the region
{a1 v1 + · · · + an vn | 0 ≤ ai < 1, ai ∈ R for 1 ≤ i ≤ n}.
Example 10.6.1 The lattice Z2 with basis v1 = (1, 0) and v2 = (0, 1) has generator and
Gram matrices both equal to the identity matrix I2 . Also det Z2 = 1. This lattice is clearly
the set of all points in the plane with integer coordinates.
√
√
√
Example 10.6.2 Let v1 = ( 2, 0) and v2 = ( 2/2, 6/2). The lattice 2 , shown in
Figure 10.3, with basis {v1 , v2 } has generator and Gram matrices
√
2 √0
√
2 1
M2 = 2
6 and A2 = 1 2 .
2
2
This lattice is often called the planar hexagonal lattice; its determinant is 3.
Exercise 604 Let 2 be the lattice of Example 10.6.2. Do the following:
(a) Draw the fundamental region of 2 in Figure 10.3.
(b) Verify geometrically that the fundamental region has area 3.
If the Gram matrix of a lattice has integer entries, then the inner product between any
two lattice points is always integer valued, and the lattice is called integral. The Z2 and
planar hexagonal lattices defined in Examples 10.6.1 and 10.6.2 are integral. Note that in
any lattice, we could rescale all the basis vectors by the same scalar; this does not change the
“geometry” of the lattice, that is, the relative positions in Rn of the lattice points, but could
change other properties, such as the integrality of the lattice or the size of the fundamental
region (det ). If we multiply each basis vector of by the scalar c, then the resulting
424
Some favorite self-dual codes
v2
✕
0
v1
Figure 10.3 Planar hexagonal lattice in R2 .
lattice is denoted c. The generating matrix of c is cM and the Gram matrix is c2 A. We
also essentially obtain the same lattice if we rotate or reflect . This leads to the concept
of equivalence: the lattice with generator matrix M is equivalent to the lattice ′ if there
is a scalar c, a matrix U with integer entries and det U = ±1, and a matrix B with real
entries and B B T = In such that cU M B is a generator matrix of ′ . (The matrix B acts
as a series of rotations and the matrix U acts as a series of reflections.) Generally, when
choosing a representative lattice in an equivalence class of lattices, one attempts to choose
a representative that is integral and has determinant as small as possible.
Using the ordinary inner product in Rn , we can define the dual lattice ∗ of to be
the set of vectors in Rn whose inner product with every lattice point in is an integer;
that is,
∗ = {y ∈ Rn | x · y ∈ Z for all x ∈ }.
An integral lattice is called unimodular or self-dual if = ∗ .
Theorem 10.6.3 Let be a lattice in Rn with basis v1 , . . . , vn , generator matrix M, and
Gram matrix A. The following hold:
(i) ∗ = {y ∈ Rn | vi · y ∈ Z for all 1 ≤ i ≤ n} = {y ∈ Rn | yM T ∈ Zn }.
(ii) The generator matrix of ∗ is (M −1 )T .
(iii) The Gram matrix of ∗ is A−1 .
(iv) det ∗ = 1/det .
(v) is integral if and only if ⊆ ∗ .
425
10.6 Lattices and self-dual codes
(vi) If is integral, then
⊆ ∗ ⊆
1
= (det ∗ ).
det
(vii) If is integral, then is unimodular if and only if det = ±1.
Proof: We leave the proofs of (i) and (v) for Exercise 605. For (ii), let w1 , . . . , wn be the
rows of (M −1 )T . Then {w1 , . . . , wn } is a basis of Rn and, as (M −1 )T M T = (M M −1 )T = In ,
1 if i = j,
wi · v j =
(10.10)
0 if i = j.
In particular, {w1 , . . . , wn } ⊆ ∗ by (i). Let w ∈ ∗ . Then w = a1 w1 + · · · + an wn as
{w1 , . . . , wn } is a basis of Rn . As w ∈ ∗ , w · v j ∈ Z. But w · v j = a j for 1 ≤ j ≤ n by
(10.10). Hence a j ∈ Z and (ii) holds. The Gram matrix of ∗ is (M −1 )T M −1 = (M M T )−1 =
A−1 yielding (iii). Thus
det ∗ = det A−1 =
1
1
=
,
det A
det
producing (iv).
For (vi) we only need to prove the second containment by parts (iv) and (v). Let
y ∈ ∗ . Then yM T ∈ Zn by (i), and hence there exists z ∈ Zn such that y = z(M T )−1 =
z(M T )−1 M −1 M = z(M M T )−1 M = zA−1 M. But A−1 = (det A)−1 adj(A), where adj(A) is
the adjoint of A. Hence
y = z(det A)−1 adj(A)M = z′ (det A)−1 M,
where z′ = z adj(A) ∈ Zn as adj(A) has integer entries since is integral. Thus y ∈
(det )−1 verifying (vi).
Finally by (iv) if is unimodular, then det = det ∗ = 1/(det ) implying det =
±1. Conversely, if det = ±1 and is integral, (vi) shows that ⊆ ∗ ⊆ ± and so
= ∗ as clearly = ±.
Exercise 605 Prove Theorem 10.6.3(i) and (v).
Exercise 606 Prove that if is a lattice then = c if and only if c = ±1.
In an integral lattice , x · x is an integer for all x ∈ . An integral lattice is called even
if x · x is an even integer for all x ∈ . In particular, the Gram matrix has even entries on its
main diagonal. An integral lattice that is not even is called odd. A lattice is Type II provided
it is an even unimodular lattice and Type I if it is an odd unimodular lattice. Type I lattices
exist for any dimension n; however, Type II lattices exist if and only if n ≡ 0 (mod 8). The
unimodular lattices have been classified up to equivalence through dimension 25; note that
Type I codes have been classified through length 30 and Type II through length 32.
Exercise 607 Show that the lattice Z2 from Example 10.6.1 is a Type I lattice while 2
from Example 10.6.2 is not unimodular.
Analogous to measuring the Hamming distance between codewords, we measure the
426
Some favorite self-dual codes
square of the Euclidean distance, or norm, between lattice points. The norm of a vector
v = v1 · · · vn in Rn is
n
N (v) = v · v =
vi2 .
i=1
The minimum squared distance, or minimum norm, between two points in a lattice ,
denoted µ, is
µ = min{N (x − y) | x, y ∈ , x = y} = min{N (x) | x ∈ , x = 0}.
The minimum norm is analogous to the minimum distance of a code. With lattices there is a
power series, called the theta series, that corresponds to the weight enumerator of a code.
The theta series # (q) of a lattice is
# (q) =
q x·x .
x∈
If is integral and Nm is the number of lattice points of norm m, then
# (q) =
∞
Nm q m .
m=0
For both Type I and Type II lattices, there is a result known as Hecke’s Theorem, corresponding to Gleason’s Theorem for self-dual codes, that states that the theta series of such a
lattice is a complex power series in two specific power series which depend on whether the
lattice is Type I or Type II. As with self-dual codes, Hecke’s Theorem can also be used to
produce an upper bound on the minimum norm of a Type I or II lattice: for Type I the bound
is µ ≤ ⌊n/8⌋ + 1, and for Type II the bound is µ ≤ 2 ⌊n/24⌋ + 2. As Gleason’s Theorem
implies that Type II codes exist only for lengths a multiple of 8, Hecke’s Theorem implies
that Type II lattices exist in Rn only when n ≡ 0 (mod 8).
Exercise 608 Let 2 be the lattice of Example 10.6.2. Do the following:
(a) Verify that the minimum norm of the lattice is 2.
(b) Show that the norm of any lattice point of 2 is an even integer.
(c) In Figure 10.3, find all lattice points whose norm is:
(i) 2,
(ii) 4,
(iii) 6,
and write these points as integer combinations of {v1 , v2 }.
We can tie the concept of minimum norm of a lattice to the notion of a lattice packing.
√
If is a lattice with norm µ, then the n-dimensional spheres of radius ρ = µ/2 centered
n
at lattice points form a lattice packing in R . These spheres do not overlap except at their
boundary. The number of spheres touching the sphere centered at 0 is the number of lattice
points of minimum norm µ; this number is called the kissing number. So the kissing number
in lattices corresponds to the number of minimum weight codewords in a code. For a variety
of reasons it is of interest to find lattices with large kissing numbers.
427
10.6 Lattices and self-dual codes
Exercise 609 Show that in a lattice packing, the number of spheres touching any sphere
is the kissing number.
Exercise 610 Do the following:
(a) What is the radius of the spheres in a lattice packing using the lattices Z2 and 2 from
Examples 10.6.1 and 10.6.2?
(b) What is the kissing number of each of these lattices?
We now present a general construction, known as Construction A, of lattices directly
from binary codes. Let C be an [n, k, d] binary code. The lattice (C) is the set of all x in
Rn obtained from a codeword in C by viewing the codeword as an integer vector with√0s
and 1s, adding even integers to any components, and dividing the resulting vector by 2.
In short
√
(C) = {x ∈ Rn | 2x (mod 2) ∈ C}.
The following can be found in [60, Chapter 7].
Theorem 10.6.4 Let C be an [n, k, d] binary code. The following hold:
(i) If d ≤ 4, the minimum norm µ of (C) is µ = d/2; if d > 4, µ = 2.
(ii) det (C) = 2n−2k .
(iii) (C ⊥ ) = (C)∗ .
(iv) (C) is integral if and only if C is self-orthogonal.
(v) (C) is Type I if and only if C is Type I.
(vi) (C) is Type II if and only if C is Type II.
Proof: We leave the proofs of (i) and (vi) as an exercise. Without loss of generality, we
may assume that G = [Ik B] is a generator matrix for C. Then, clearly,
1 Ik
B
M=√
2 O 2In−k
and
A=
1 Ik + B B T 2B
4In−k
2B T
2
are the generator and Gram matrices for (C), respectively. Hence
det A =
1
I + B BT
det k
n
2B T
2
= 2−k det
Ik
BT
1
2B
I + B BT
= n 2n−k det k T
4In−k
B
2
2B
2In−k
O
= 2−k 2n−k ,
2In−k
yielding (ii).
As G ⊥ = [B T In−k ] is the generator matrix of C ⊥ , (C ⊥ ) has generator matrix
1 BT
M⊥ = √
2 2Ik
In−k
.
O
The (binary) inner product of a row of G with a row of G ⊥ is 0; hence we see that the
(real) inner product of a row of M and a row of M ⊥ is an integer. This proves that M ⊥ M T
is an integer matrix and (C ⊥ ) ⊆ (C)∗ . To complete (iii) we must show (C)∗ ⊆ (C ⊥ ).
Let y ∈ (C)∗ . Then yM T ∈ Zn by Theorem 10.6.3(i). So there exists z ∈ Zn with
428
Some favorite self-dual codes
y = z(M T )−1 = z(M T )−1 (M ⊥ )−1 M ⊥ = z(M ⊥ M T )−1 M ⊥ . As M ⊥ M T is an integer matrix
and det(M ⊥ M T ) = det(M ⊥ ) det(M T ) = (±2−n/2 · 2k ) · (2(n−2k)/2 ) = ±1,
(M ⊥ M T )−1 =
1
adj(M ⊥ M T )
det(M ⊥ M T )
is an integer matrix. Thus y = z′ M ⊥ for some z′ ∈ Zn . Hence y ∈ (C ⊥ ), completing (iii).
The (real) inner product of two rows of M is always an integer if and only if the (binary)
inner product of two rows of G is 0, proving (iv).
Suppose (C) is Type I; then det (C) = ±1, implying k = n/2 by (ii). As (C) is
integral, C ⊆ C ⊥ by (iv). Hence C is self-dual. The (real) inner product of some lattice point
with itself is an odd integer; the corresponding codeword must be singly-even implying
that C is Type I. Conversely, suppose C is Type I. By (iv) (C) is integral and by (iii)
(C) = (C ⊥ ) = (C)∗ . Hence (C) is unimodular. In addition, a singly-even codeword
corresponds to a lattice point with odd integer norm. This proves (v).
Exercise 611 Prove Theorem 10.6.4(i) and (vi).
Example 10.6.5 Let C be the [8, 4, 4] extended Hamming code (denoted e8 in Section 9.7).
By Theorem 10.6.4, (C) is a Type II lattice with minimum norm 2. We ask you to verify
this directly in Exercise 612. Up to equivalence, this lattice, which is usually denoted E 8 ,
is known to be the unique Type II lattice in R8 . We can determine precisely the lattice
points of minimum norm 2. One can add ±2√to any coordinate of the zero vector yielding
16 lattice points √
of norm 2 with “shape” (1/ 2)(±2, 07 ), indicating any vector with seven
0s and one ±2/ 2. Beginning with any of the 14 weight 4 codewords, one can add −2
to any of the four
√ nonzero coordinates obtaining 14 · 16 = 224 lattice points of norm 2
with shape (1/ 2)((±1)4 , 04 ). These are all of the 240 lattice points of norm 2. Hence, E 8
has kissing number 240. It is known that no other 8-dimensional lattice can have a higher
kissing number. The theta series for E 8 is
# E8 (q) = 1 + 240q 2 + 2160q 4 + 6720q 6 + 17520q 8 + 30240q 10 + · · ·
= 1 + 240
∞
σ3 (m)q 2m ,
m=1
where σ3 (m) = d|m d 3 . It is also known
√ that the lattice E 8 yields the densest possible
lattice packing, with spheres of radius 1/ 2, where the density is the fraction of the space
covered by the spheres; the density of E 8 is π 4 /384 ≈ 0.253 669 51. It seems that T. Gosset
[110], in 1900, was the first to study E 8 , along with 6- and 7-dimensional lattices denoted
E 6 and E 7 ; hence these are sometimes called Gosset lattices.
Exercise 612 Let e8 be the [8, 4, 4] extended binary Hamming code.
(a) Give a generator matrix for e8 in the form [I4 B].
(b) Give the generator and Gram matrices for (e8 ).
(c) Show directly, without quoting Theorem 10.6.4, that (e8 ) is integral with determinant 1.
(d) Show directly, without quoting Theorem 10.6.4, that (e8 ) is Type II with minimum
norm 2.
Exercise 613 Find the shapes of each of the 2160 lattice points in E 8 of norm 4.
429
10.6 Lattices and self-dual codes
The disadvantage of Construction A is that the largest minimum norm of a resulting
lattice is 2. There are other constructions of lattices from codes that produce lattices of great
interest. To conclude this section, we present one construction, due to J. H. Conway [60,
Chapter 10], of the 24-dimensional Leech lattice 24 using the [24, 12, 8] extended binary
Golay code. This lattice was discovered in 1965 by J. Leech [188]. The lattice 24 is known
to have the highest kissing number (196 560) of any 24-dimensional lattice. Furthermore,
24 is the unique Type II lattice of dimension 24 with minimum norm 4, the highest norm
possible for Type II lattices in R24 .
We need some notation. Label the coordinates " = {1, 2, . . . , 24} and let {e1 , e2 , . . . , e24 }
be the standard orthonormal basis of coordinate vectors in R24 . If S ⊆ ", let e S = i∈S ei .
Fix some representation of the [24, 12, 8] extended binary Golay code G 24 . A subset S of
" is called a G 24 -set if it is the support of a codeword in G 24 . Recall from Example 8.3.2
that the subsets of " of size 4 are called tetrads, and that each tetrad T uniquely determines
five other tetrads, all six being pairwise disjoint, so that the union of any two tetrads is a
G 24 -set of size 8; these six tetrads are called the sextet determined by T . We let T denote
the set of all tetrads and O denote the set of 759 octads, that is, the set of all G 24 -sets of size
8. Recall from either Example 8.3.2 or the proof of the uniqueness of G 24 in Section 10.1
that the codewords of weight 8 in G 24 span G 24 .
We first construct an intermediate lattice Ŵ0 , then add one more spanning vector to form
a lattice Ŵ1 , and finally rescale to obtain 24 . Define Ŵ0 to be the 24-dimensional lattice
spanned by {2e S | S ∈ O}.
Lemma 10.6.6 The following hold in the lattice Ŵ0 :
(i) Every lattice point has even integer components.
(ii) Ŵ0 contains all vectors 4eT , where T ∈ T .
(iii) Ŵ0 contains 4ei − 4e j for all i and j in ".
(iv) A vector in R24 with all even integer components is in Ŵ0 if and only if the sum of
its components is a multiple of 16 and the set of coordinates in the vector where the
components are not divisible by 4 forms a G 24 -set.
Proof: Part (i) is immediate as 2e S with S ∈ O has all even integer components. Let T =
T0 , T1 , . . . , T5 be the sextet determined by the tetrad T . Since 4eT = 2eT ∪T1 + 2eT ∪T2 −
2eT1 ∪T2 , we have (ii). Now let T = {i, x, y, z} ∈ T and T ′ = { j, x, y, z} ∈ T . Then 4eT −
4eT ′ = 4ei − 4e j , showing (iii).
Let v ∈ Ŵ0 . Since a spanning vector 2e S for S ∈ O has its sum of components equaling
16, the sum of the components of v is a multiple of 16. If we take v, divide its components
by 2 and reduce modulo 2, we obtain a binary combination of some of the e S s where S ∈ O,
which can be viewed as codewords in G 24 and hence must sum to eC for some G 24 -set C;
but C is precisely the set of coordinates where the components of v are not divisible by 4.
Conversely, let v be a vector in R24 with all even integer components such that the sum of
its components is a multiple of 16 and the set of coordinates in v where the components are
not divisible by 4 forms a G 24 -set C. Let c ∈ G 24 have support C; then c = c1 + · · · + cr
where each ci is a weight 8 codeword of G 24 , as the weight 8 codewords span G 24 . Let
Si = supp(ci ). Define x = ri=1 2e Si ∈ Ŵ0 . Clearly, the coordinates of x that are 2 modulo
4 are precisely the coordinates of C and the rest are 0 modulo 4. Therefore the vector
430
Some favorite self-dual codes
v − x has sum of components a multiple of 16 and all components are divisible by 4. By
Exercise 614 and parts (ii) and (iii), v − x ∈ Ŵ0 and hence v ∈ Ŵ0 .
Exercise 614 Show that the set of vectors in R24 whose components are multiples of 4 and
whose component sum is a multiple of 16 are precisely the vectors in the lattice spanned
by 4eT for T ∈ T and 4ei − 4e j for i and j in ".
The lattice Ŵ1 is defined to be the lattice spanned by Ŵ0 and the vector s =
(−3, 1, 1, . . . , 1). Notice that we obtain a vector with −3 in coordinate i and 1s elsewhere
by adding 4e1 − 4ei to s.
Lemma 10.6.7 The vector v = v1 v2 · · · v24 is in Ŵ1 if and only if the following three conditions all hold:
(i) The components vi are all congruent to the same value m modulo 2.
(ii) The set of i for which the components vi are congruent to the same value modulo 4
form a G 24 -set.
24
(iii)
i=1 vi ≡ 4m (mod 8), where m is defined in (i).
Furthermore, if x and y are in Ŵ1 , then x · y is a multiple of 8 and x · x is a multiple of 16.
Proof: Statements (i), (ii), and (iii) hold for the vectors spanning Ŵ1 and hence for any
vector in Ŵ1 . The final statement also holds for vectors in Ŵ1 as it too holds for the vectors
spanning Ŵ1 , noting that if x = i xi , then x · x = i xi · xi + 2 i< j xi · x j .
Suppose that v satisfies (i), (ii), and (iii). Then we can subtract αs from v where α is one
of 0, 1, 2, or 3, so that v − αs has only even coordinates and its component sum is a multiple
of 16. By (ii), the set of coordinates in v − αs where the components are not divisible by 4
forms a G 24 -set. Hence v − αs is in Ŵ0 by Lemma 10.6.6 and the proof is complete.
√
The lattice Ŵ1 is not self-dual. We obtain 24 by multiplying all vectors in Ŵ1 by 1/ 8.
Thus
1
24 = √ Ŵ1 .
8
By Lemma 10.6.7, the inner product of any two vectors in 24 is an integer, making 24
integral. By the same lemma, the norm of a vector in 24 is an even integer. It can be shown
that det 24 = ±1 implying that 24 is a Type II lattice. By Exercise 615, its minimum
norm is 4 and its kissing number is 196 560.
√
Exercise 615 Upon scaling a vector in Ŵ1 by multiplying by 1/ 8, the norm of the vector
is reduced by a factor of eight. In this exercise you will show that no vector in 24 has norm
2 and also find the vectors in 24 of norm 4. This is equivalent to showing that no vector in
Ŵ1 has norm 16 and finding those vectors in Ŵ1 of norm 32.
(a) Using the characterization of Ŵ1 in Lemma 10.6.7, show that Ŵ1 has no vectors of
norm 16.
(b) Show that a vector of shape ((±2)8 , 016 ) with an even number of minus signs and the
±2s in coordinates forming an octad is in Ŵ1 .
(c) Show that there are 27 · 759 vectors described in part (b).
(d) Show that a vector of shape ((∓3), (±1)23 ) with the lower signs in a G 24 -set is in Ŵ1 .
431
10.6 Lattices and self-dual codes
(e) Show that there are 212 · 24 vectors described in part (d).
(f) Show that a vector of shape ((±4)2 , 022 ) with the plus and minus signs assigned arbitrarily is in Ŵ1 .
(g) Show that there are 2 · 24 · 23 vectors described in part (f ).
(h) Show that a vector in Ŵ1 of norm 32 is one of those found in part (b), (d), or (f ).
(i) Show that the minimum norm of 24 is 4 and that the kissing number is 196 560.
There are a number of other constructions for 24 including one from the [24, 12, 9]
symmetry code over F3 ; see Section 5.7 of Chapter 5 in [60]. Lattices over the complex
numbers can also be defined; the [12, 6, 6] extended ternary Golay code can be used to
construct the 12-dimensional complex version of the Leech lattice; see Section 8 of Chapter 7
in [60].
11
Covering radius and cosets
In this chapter we examine in more detail the concept of covering radius first introduced in
Section 1.12. In our study of codes, we have focused on codes with high minimum distance
in order to have good error-correcting capabilities. An [n, k, d] code C over Fq can correct
t = ⌊(d − 1)/2⌋ errors. Thus spheres in Fqn of radius t centered at codewords are pairwise
disjoint, a fact that fails for spheres of larger radius; t is called the packing radius of C. It
is natural to explore the opposite situation of finding the smallest radius ρ(C), called the
covering radius, of spheres centered at codewords that completely cover the space Fqn ; that
is, every vector in Fqn is in at least one of these spheres. Alternately, given a radius, it is
mathematically interesting to find the centers of the fewest number of spheres of that radius
which cover the space. When decoding C, if t or fewer errors are made, the received vector
can be uniquely decoded. If the number of errors is more than t but no more than the covering
radius, sometimes these errors can still be uniquely corrected. A number of applications
have arisen for the coverings arising from codes, ranging from data compression to football
pools. In general, if you are given a code, it is difficult to find its covering radius, and the
complexity of this has been investigated; see [51].
In this chapter we will examine a portion of the basic theory and properties of the covering
radius. We will also discuss what is known about the covering radius of specific codes or
families of codes. Most of the work in this area has been done for binary codes, both linear
and nonlinear. The reader interested in exploring covering radius in more depth should
consult [34, 51].
11.1
Basics
We begin with some notation and recall some basic facts from Section 1.12. We also present
a few new results that we will need in later sections.
Let C be a code over Fq of length n that is possibly nonlinear. We say that a vector in
n
Fq is ρ-covered by C if it has distance ρ or less from at least one codeword in C. In this
terminology the covering radius ρ = ρ(C) of C is the smallest integer ρ such that every
vector in Fqn is ρ-covered by C. Equivalently,
ρ(C) = maxn min d(x, c).
x∈Fq c∈C
(11.1)
Exercise 616 Show that the definition of the covering radius of C given by (11.1) is
equivalent to the definition that ρ = ρ(C) is the smallest integer ρ such that every vector in
Fqn is ρ-covered by C.
433
11.1 Basics
Exercise 617 Find the covering radius and the packing radius of the [n, 1, n] repetition
code over Fq .
Exercise 618 Prove that if C 1 ⊆ C 2 , then ρ(C 2 ) ≤ ρ(C 1 ).
The definitions of covering radius and packing radius lead directly to a relationship
between the two as we noted in Section 1.12.
Theorem 11.1.1 Let C be a code of minimum distance d. Then
ρ(C) ≥
d −1
.
2
Furthermore, we have equality if and only if C is perfect.
Another equivalent formulation of the covering radius is stated in Theorem 1.12.5; we
restate this result but generalize slightly the first part to apply to nonlinear as well as linear
codes. In order to simplify terminology, we will call the set v + C a coset of C with coset
representative v even if C is nonlinear. Care must be taken when using cosets with nonlinear
codes; for example, distinct cosets need not be disjoint, as Exercise 619 shows.
Theorem 11.1.2 Let C be a code. The following hold:
(i) ρ(C) is the largest value of the minimum weight of all cosets of C.
(ii) If C is an [n, k] linear code over Fq with parity check matrix H , then ρ(C) is the smallest
number s such that every nonzero vector in Fqn−k is a combination of s or fewer columns
of H .
Exercise 619 Find a nonlinear code and a pair of cosets of that code that are neither equal
nor disjoint.
Exercise 620 Let C be an [n, k, d] linear code over Fq with covering radius ρ(C) ≥ d.
Explain why there is a linear code with more codewords than C but the same error-correcting
capability.
An obvious corollary of part (ii) of Theorem 11.1.2 is the following upper bound on the
covering radius of a linear code, called the Redundancy Bound.
Corollary 11.1.3 (Redundancy Bound) Let C be an [n, k] code. Then ρ(C) ≤ n − k.
Exercise 621 Prove Corollary 11.1.3.
Recall from (1.11) that a sphere of radius r in Fqn contains
n
(q − 1)i
i
i=0
r
vectors. The following is a lower bound on the covering radius, called the Sphere Covering
Bound; its proof is left as an exercise.
434
Covering radius and cosets
Theorem 11.1.4 (Sphere Covering Bound) If C is a code of length n over Fq , then
0
/
r
n
i
n
(q − 1) ≥ q .
ρ(C) ≥ min r | |C|
i
i=0
Exercise 622 Prove Theorem 11.1.4.
Exercise 623 Let C be a [7, 3, 4] binary code. Give the upper and lower bounds for this
code from the Redundancy Bound and the Sphere Covering Bound.
Another lower bound can be obtained from the following lemma, known as the Supercode
Lemma.
Lemma 11.1.5 (Supercode Lemma) If C and C ′ are linear codes with C ⊆ C ′ , then
ρ(C) ≥ min{wt(x) | x ∈ C ′ \ C}.
Proof: Let x be a vector in C ′ \ C of minimum weight. Such a vector must be a coset leader
of C as C ⊆ C ′ ; therefore ρ(C) ≥ wt(x) and the result follows.
Theorem 1.12.6 will be used repeatedly, and we restate it here for convenience.
C be the extension of C, and let C ∗ be
Theorem 11.1.6 Let C be an [n, k] code over Fq . Let
a code obtained from C by puncturing on some coordinate. The following hold:
(i) If C = C 1 ⊕ C 2 , then ρ(C) = ρ(C 1 ) + ρ(C 2 ).
(ii) ρ(C ∗ ) = ρ(C) or ρ(C ∗ ) = ρ(C) − 1.
(iii) ρ(
C) = ρ(C) or ρ(
C) = ρ(C) + 1.
(iv) If q = 2, then ρ(C) = ρ(C) + 1.
(v) Assume that x is a coset leader of C. If x′ ∈ Fqn all of whose nonzero components agree
with the same components of x, then x′ is also a coset leader of C. In particular, if
there is a coset of weight s, there is also a coset of any weight less than s.
Exercise 624 This exercise is designed to find the covering radius of a [7, 3, 4] binary
code C.
(a) Show that a [7, 3, 4] binary code is unique up to equivalence.
(b) Apply the Supercode Lemma to obtain a lower bound on ρ(C) using C ′ equal to the
[7, 4, 3] Hamming code; make sure you justify that C is indeed a subcode of C ′ .
(c) Show that there is no [7, 4, 4] binary code.
(d) Find ρ(C) and justify your answer.
Exercise 625 Let C be an even binary code. Show that if C ∗ is obtained from C by puncturing
on one coordinate, then ρ(C ∗ ) = ρ(C) − 1.
Recall from Theorem 11.1.1 that the covering radius of a code equals the packing radius
of the code if and only if the code is perfect. We have the following result dealing with
extended perfect codes and codes whose covering radius is one larger than the packing
radius.
Theorem 11.1.7 If C is an [n, k, d] extended perfect code, then ρ(C) = d/2. Furthermore,
if C is an [n, k, d] binary code, then C is an extended perfect code if and only if ρ(C) = d/2.
435
11.2 The Norse Bound and Reed–Muller codes
Exercise 626 Prove Theorem 11.1.7.
Exercise 627 Let C be an [n, k] binary code. Define
C 1 = {(c1 + c2 + · · · + cn , c2 , . . . , cn ) | (c1 , c2 , . . . , cn ) ∈ C}.
(a) Prove that
C and
C 1 are equivalent.
(b) Prove that C 1 is k-dimensional.
(c) Prove that ρ(C) = ρ(C 1 ).
11.2
The Norse Bound and Reed–Muller codes
Another pair of upper bounds, presented in the next theorem, are called the Norse Bounds;
they apply only to binary codes. The Norse Bounds are so named because they were discovered by the Norwegians Helleseth, Kløve, and Mykkeltveit [126]. They will be used to
help determine the covering radius of some of the Reed–Muller codes. Before continuing,
we need some additional terminology.
Let C be an (n, M) binary code. Let M be an M × n matrix whose rows are the codewords of C in some order. We say that C has strength s, where 1 ≤ s ≤ n, provided that in
each set of s columns of M each binary s-tuple occurs the same number M/2s of times.
Exercise 628 gives a number of results related to the strength of a binary code. The code C
is self-complementary provided for each codeword c ∈ C, the complementary vector 1 + c,
obtained from c by replacing 1 by 0 and 0 by 1, is also in C. Exercise 629 gives a few results
about self-complementary codes.
Exercise 628 Let C be an (n, M) binary code whose codewords are the rows of the M × n
matrix M. Prove the following:
(a) If C has strength s ≥ 2, then C also has strength s − 1.
(b) If C has strength s, then so does each coset v + C.
(c) If C is linear, then C has strength s if and only if every s columns of a generator matrix
of C are linearly independent if and only if the minimum weight of C ⊥ is at least s + 1.
Hint: See Corollary 1.4.14.
Exercise 629 Let C be a self-complementary binary code. Prove the following:
(a) Each coset v + C is self-complementary.
(b) If C is also linear, C ⊥ is an even code.
We now state the Norse Bounds.
Theorem 11.2.1 (Norse Bounds) Let C be an (n, M) binary code.
(i) If C has strength 1, then ρ(C) ≤ ⌊n/2⌋.
√
(ii) If C has strength 2 and C is self-complementary, then ρ(C) ≤ ⌊(n − n)/2⌋.
Proof: Let M be an M × n matrix whose rows are the codewords of C. First assume that
C has strength 1. For v ∈ Fn2 let Mv be an M × n matrix whose rows are the codewords of
v + C. By Exercise 628(b), v + C also has strength 1. Therefore each column of Mv has
436
Covering radius and cosets
exactly half its entries equal to 0 and half equal to 1, implying that
u∈v+C
nM
,
2
wt(u) =
or equivalently,
n
i=0
i Ai (v) =
nM
,
2
(11.2)
where {Ai (v) | 0 ≤ i ≤ n} is the weight distribution of v + C. Equation (11.2) implies that
the average distance between v and the codewords of C is n/2. Therefore the distance
between v and some codeword c ∈ C is at most n/2. By Theorem 11.1.2(i) we have part (i).
Now assume that C, and hence v + C, has strength 2. In each row of Mv corresponding
ordered pairs of 1s. But each of the n2 pairs of
to the vector u ∈ v + C, there are wt(u)
2
distinct columns of Mv contains the binary 2-tuple (1, 1) exactly M/4 times. Therefore
wt(u)
wt(u)(wt(u) − 1)
n(n − 1) M
n M
=
=
=
.
2
2
4
2
4
2
u∈v+C
u∈v+C
Using (11.2) we obtain
n
i=0
i 2 Ai (v) =
n(n + 1)M
.
4
(11.3)
Combining (11.2) and (11.3) produces
n
i=0
(2i − n)2 Ai (v) = n M.
(11.4)
This implies that there is an i with Ai (v) = 0 such that (2i − n)2 ≥ n. Taking the square
root of both sides of this inequality shows that either
√
√
n+ n
n− n
or i ≥
.
i≤
2
2
Suppose now that C is self-complementary. By Exercise 629, v + C is also selfcomplementary. Then there is an i satisfying the first of the above two inequalities with
√
Ai (v) = 0. Thus there is a vector u ∈ v + C with wt(u) ≤ (n − n)/2. Hence (ii) holds.
In the case of binary linear codes, the Norse Bounds can be restated as follows.
Corollary 11.2.2 Let C be a binary linear code of length n.
(i) If C ⊥ has minimum distance at least 2, then ρ(C) ≤ ⌊n/2⌋.
(ii) If C ⊥ has minimum distance at least 4 and the all-one vector 1 is in C, then ρ(C) ≤
√
⌊(n − n)/2⌋.
Exercise 630 Prove Corollary 11.2.2.
Exercise 631 Let C be a self-complementary binary code of strength 2 with ρ(C) =
√
(n − n)/2, and let v + C have minimum weight ρ(C). Show that half the vectors in v + C
437
11.2 The Norse Bound and Reed–Muller codes
have weight (n −
(11.4).
√
n)/2 and half the vectors have weight (n +
√
n)/2. Hint: Examine
We are now ready to examine the covering radius of the Reed–Muller codes R(r , m)
first defined in Section 1.10. Let ρRM (r , m) denote the covering radius of R(r , m). Using
Exercise 617, ρRM (0, m) = 2m−1 as R(0, m) is the binary repetition code. Also ρRM (m, m) =
m
0 as R(m, m) = F22 . We consider the case 1 ≤ r < m. By Theorem 1.10.1, R(r , m)⊥ =
R(m − r − 1, m) has minimum weight 2r +1 ≥ 4; therefore R(r , m) has strength 2 by
Exercise 628. In addition R(r , m) is self-complementary as 1 ∈ R(0, m) ⊂ R(r , m) also
by Theorem 1.10.1. Therefore by the Norse Bound (ii) we have the following theorem.
Theorem 11.2.3 Let 1 ≤ r < m. Then ρRM (r , m) ≤ 2m−1 − 2(m−2)/2 . In particular,
ρRM (1, m) ≤ 2m−1 − 2(m−2)/2 .
Notice that by Exercise 618 and the nesting properties of the Reed–Muller codes, any
upper bound on ρRM (1, m) is automatically an upper bound on ρRM (r , m) for 1 < r .
We now find a recurrence relation for the covering radius of first order Reed–Muller codes
that, combined with Theorem 11.2.3, will allow us to get an exact value for ρRM (1, m) when
m is even.
Lemma 11.2.4 For m ≥ 2,
ρRM (1, m) ≥ 2ρRM (1, m − 2) + 2m−2 .
Proof: For a vector x ∈ Fn2 , let x = 1 + x denote the complement of x. From Exercise 632,
every codeword in R(1, m) has one of the following forms:
(a) (x|x|x|x),
(b) (x|x|x|x),
(c) (x|x|x|x), or
(d) (x|x|x|x),
m−2
where x ∈ R(1, m − 2). Choose a vector v ∈ F22 such that the coset v + R(1, m − 2)
has weight equaling ρRM (1, m − 2). Therefore,
d(v, x) ≥ ρRM (1, m − 2)
(11.5)
for all x ∈ R(1, m − 2). Notice that
d(v, x) = d(v, x) = 2m−2 − d(v, x) = 2m−2 − d(v, x).
m
(11.6)
We prove the lemma if we find a vector y in F22 which has distance at least 2ρRM (1, m −
2) + 2m−2 from all codewords c ∈ R(1, m). Let y = (v|v|v|v). We examine each possibility
(a) through (d) above for c. For the corresponding case, using (11.5), (11.6), and the fact
that x ∈ R(1, m − 2) if x ∈ R(1, m − 2), we have:
(a) d(y, c) = 3d(v, x) + d(v, x) = 2m−2 + 2d(v, x) ≥ 2m−2 + 2ρRM (1, m − 2),
(b) d(y, c) = d(v, x) + 2d(v, x) + d(v, x) = d(v, x) + 3d(v, x) = 2m−2 + 2d(v, x) ≥
2m−2 + 2ρRM (1, m − 2),
(c) d(y, c) = 2d(v, x) + d(v, x) + d(v, x) = 3d(v, x) + d(v, x) = 2d(v, x) + 2m−2 ≥
2ρRM (1, m − 2) + 2m−2 , and
438
Covering radius and cosets
(d) d(y, c) = 2d(v, x) + d(v, x) + d(v, x) = 3d(v, x) + d(v, x) = 2d(v, x) + 2m−2 ≥
2ρRM (1, m − 2) + 2m−2 .
The result follows.
Exercise 632 Show that every codeword in R(1, m) has one of the following forms:
(a) (x|x|x|x),
(b) (x|x|x|x),
(c) (x|x|x|x), or
(d) (x|x|x|x),
where x ∈ R(1, m − 2). Hint: See Section 1.10.
Exercise 633 Do the following:
(a) Show that ρRM (1, 1) = 0.
(b) Show that if m is odd, then ρRM (1, m) ≥ 2m−1 − 2(m−1)/2 . Hint: Use (a) and
Lemma 11.2.4.
(c) Show that ρRM (1, 2) = 1.
(d) Show that if m is even, then ρRM (1, m) ≥ 2m−1 − 2(m−2)/2 . Hint: Use (c) and
Lemma 11.2.4.
Combining Theorem 11.2.3 and Exercise 633(d) yields the exact value for ρRM (1, m)
when m is even.
Theorem 11.2.5 For m even, ρRM (1, m) = 2m−1 − 2(m−2)/2 .
When m is odd, the inequality in Exercise 633(b) is an equality for the first four values
of m. By Exercise 633(a), ρRM (1, 1) = 0. As R(1, 3) is the [8, 4, 4] extended Hamming
code, ρRM (1, 3) = 2 by Theorem 11.1.7. It is not difficult to show that ρRM (1, 5) = 12;
one theoretical proof is given as a consequence of Lemma 9.4 in [34], or a computer
package that calculates covering radius could also verify this. Mykkeltveit [242] showed
that ρRM (1, 7) = 56. From these four cases one would conjecture that the lower bound in
Exercise 633(b) for m odd is an equality. This, however, is false; when m = 15, the covering
radius of R(1, 15) is at least 16 276 = 214 − 27 + 20 [250, 251].
There are bounds on ρRM (r, m) for 1 < r but relatively few results giving exact values
for ρRM (r, m). One such result is found in [237].
Theorem 11.2.6
ρRM (m − 3, m) =
m + 2 for m even,
m + 1 for m odd.
Table 11.1, found in [34], gives exact values and bounds on ρRM (r, m) for 1 ≤ r ≤ m ≤ 9.
References for the entries can be found in [34].
Exercise 634 A computer algebra package will be useful for this problem. Show that
the Sphere Covering Bound gives the lower bounds on ρRM (r, m) found in Table 11.1
for:
(a) (r, m) = (3, 8) and
(b) (r, m) = (r, 9) when 2 ≤ r ≤ 5.
439
11.3 Covering radius of BCH codes
Table 11.1 Bounds on ρRM (r, m) for 1 ≤ r ≤ m ≤ 9
r \m
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
0
1
0
2
1
0
6
2
1
0
12
6
2
1
0
28
18
8
2
1
0
56
40–44
20–23
8
2
1
0
120
84–100
43–67
22–31
10
2
1
0
240–244
171–220
111–167
58–98
23–41
10
2
1
0
Shortening a first order Reed–Muller code produces a binary simplex code; see
Exercise 62. While we do not have exact values for the covering radius of all first order
Reed–Muller codes, we do have exact values for the covering radius of all binary simplex
codes.
Theorem 11.2.7 The covering radius ρ of the [2m − 1, m, 2m−1 ] binary simplex code S m
is 2m−1 − 1.
Proof: As no coordinate of S m is identically 0, S m has strength 1 by Exercise 628(c). Hence
by the Norse Bound, ρ ≤ 2m−1 − 1. By Theorem 2.7.5, all nonzero codewords of S m have
weight 2m−1 . Hence the coset of S m containing the all-one vector has weight 2m−1 − 1.
Therefore 2m−1 − 1 ≤ ρ.
11.3
Covering radius of BCH codes
As with Reed–Muller codes, the covering radii of codes from certain subfamilies of BCH
codes have been computed. In particular, let B(t, m) be the primitive narrow-sense binary
BCH code of length n = 2m − 1 and designed distance d = 2t + 1. Exercise 635 gives the
dimensions of these codes when t ≤ 2⌈m/2⌉−1 .
Exercise 635 Let Ci be the 2-cyclotomic coset modulo n = 2m − 1 containing i.
(a) For 0 ≤ i ≤ n, let i = i 0 + i 1 2 + · · · + i m−2 2m−2 + i m−1 2m−1 with i j ∈ {0, 1} be the
binary expansion of i. Thus we can associate with i the binary string
i ↔ i 0 i 1 · · · i m−2 i m−1
(11.7)
of length m. Prove that Ci contains precisely the integers j whose associated binary
string is a cyclic shift of (11.7).
(b) Prove that if j is an odd integer with j ≤ 2⌈m/2⌉ − 1, then C1 , C3 , . . . , C j are distinct
2-cyclotomic cosets each containing m elements.
(c) Prove that if 1 ≤ t ≤ 2⌈m/2⌉−1 , B(t, m) has dimension 2m − 1 − tm.
(d) Demonstrate that the upper limit on j given in part (b) cannot be increased by showing
that when m = 5, C9 equals one of the Ci with 1 ≤ i ≤ 7 and i odd.
440
Covering radius and cosets
Table 11.2 Weight distribution of B(2, m)⊥
i
m odd:
0
2m−1 − 2(m−1)/2
2m−1
2m−1 + 2(m−1)/2
m even:
0
2m−1 − 2m/2
2m−1 − 2m/2−1
2m−1
m−1
2
+ 2m/2−1
m−1
2
+ 2m/2
Ai
1
(2m − 1)(2m−2 + 2(m−3)/2 )
(2m − 1)(2m−1 + 1)
(2m − 1)(2m−2 − 2(m−3)/2 )
1
1 (m−2)/2−1 m
2
(2 − 1)(2(m−2)/2 + 1)
3
1 (m+2)/2−1 m
2
(2 − 1)(2m/2 + 1)
3
m
m−2
(2 − 1)(2
+ 1)
1 (m+2)/2−1 m
2
(2 − 1)(2m/2 − 1)
3
1 (m−2)/2−1 m
2
(2 − 1)(2(m−2)/2 − 1)
3
Let ρBCH (t, m) denote the covering radius ρ(B(t, m)) of B(t, m). We can use the Supercode Lemma to obtain a lower bound on ρBCH (t, m) when t ≤ 2⌈m/2⌉−1 .
Theorem 11.3.1 If t ≤ 2⌈m/2⌉−1 , then ρBCH (t, m) ≥ 2t − 1.
Proof: By Exercise 635, B(t − 1, m) and B(t, m) have dimensions 2m − 1 − (t − 1)m
and 2m − 1 − tm, respectively. Therefore by the nesting property of BCH codes (Theorem 5.1.2), B(t, m) is a proper subcode of B(t − 1, m). Every vector x in B(t − 1, m) \
B(t, m) has weight at least the minimum weight of B(t − 1, m), which is at least 2t − 1 by
the BCH Bound. The result now follows from the Supercode Lemma.
The case of the single error-correcting codes is quite easy to handle.
Theorem 11.3.2 For m ≥ 2, ρBCH (1, m) = 1.
Proof: The code B(1, m) is the perfect single error-correcting Hamming code Hm . As it is
perfect, ρBCH (1, m) = 1.
We now turn our attention to the much more difficult case of the double error-correcting
codes. When m is odd, we can use the lower bound in Theorem 11.3.1 and an upper bound
due to Delsarte, which agrees with the lower bound. Because the Delsarte Bound, which we
proved as part (i) of Theorem 7.5.2, will be used in this section and the next, we restate it here.
Theorem 11.3.3 (Delsarte Bound) Let C be an [n, k] code over Fq . Let S = {i > 0 |
Ai (C ⊥ ) = 0} and s = |S|. Then ρ(C) ≤ s.
We remark that Delsarte actually proved this in [62] for nonlinear codes as well, with the
appropriate interpretation of s.
We will show that ρBCH (2, m) = 3, a result due to Gorenstein, Peterson, and Zierler [108].
Theorem 11.3.1 provides a lower bound of 3 on the covering radius. The weight distribution
is known for B(2, m)⊥ . We present it without proof in Table 11.2; those interested in the
proof are referred to [218, Chapter 15]. Using the Delsarte Bound we see from the table
that an upper bound on the covering radius is 3 when m is odd and 5 when m is even. So
441
11.3 Covering radius of BCH codes
when m is odd, ρBCH (2, m) = 3 as the upper and lower bounds agree; but this proof will
not work if m is even. The case m even will require an additional lemma involving the trace
function Trm from F2m to F2 . Recall that
m−1
Trm (α) =
i
α2
i=0
for all α ∈ F2m .
Lemma 11.3.4 The equation x 2 + x + β = 0 for β ∈ F2m has a solution in F2m if Trm (β) =
0 and no solutions if Trm (β) = 1.
Proof: By the Normal Basis Theorem [196, Theorem 2.35], there is a basis
m−1
,
α, α 2 , . . . , α 2
i
called a normal basis, of F2m as a vector space over F2 . By Exercise 637, Trm (α) = Trm (α 2 )
for all i. As the trace function is not identically 0 by Lemma 3.8.5, this implies, using the
F2 -linearity of the trace function described in Lemma 3.8.5, that
i
for all i.
(11.8)
Trm α 2 = 1
Let
2
m−1
22
2m−1
x = x0 α + x1 α 2 + x2 α 2 + · · · + xm−1 α 2
2
β = b0 α + b1 α + b2 α + · · · + bm−1 α
where xi ∈ F2 , and
where bi ∈ F2 .
Then
2
x 2 = xm−1 α + x0 α 2 + x1 α 2 + · · · + xm−2 α 2
m−1
.
If x is a solution of x 2 + x + β = 0, we must have
x0 + xm−1 = b0 , x1 + x0 = b1 , . . . , xm−1 + xm−2 = bm−1 .
m−1
bi = 0. However,
The sum of the left-hand sides of all these equations is 0, giving i=0
m−1
by (11.8), Trm (β) = i=0 bi proving that Trm (β) = 0 is a necessary condition for the
existence of a solution x ∈ F2m of x 2 + x + β = 0.
You are asked to prove the converse in Exercise 638.
Exercise 636 Using Example 3.4.3, find a normal basis of F8 over F2 .
Exercise 637 If γ ∈ F2m , prove that Trm (γ ) = Trm (γ 2 ).
Exercise 638 With the notation as in Lemma 11.3.4, prove that if Trm (β) = 0, then x 2 +
x + β = 0 has two solutions x ∈ F2m where we can take x0 equal to either 0 or 1 and
xi = x0 + ij=1 b j for 1 ≤ i ≤ m − 1.
We are now ready to show that ρBCH (2, m) = 3. The case m even uses some of the
techniques we developed in Section 5.4 for decoding BCH codes.
Theorem 11.3.5 If m ≥ 3, then ρBCH (2, m) = 3.
442
Covering radius and cosets
Proof: The case m odd was proved above. Now assume that m is even. By Theorem 11.3.1,
we have ρBCH (2, m) ≥ 3. We are done if we show that every coset of B(2, m) contains a
vector of weight at most 3.
Let α be a primitive element of F2m . By Theorem 4.4.3 we can view
H=
1
1
α
α3
α2
α6
· · · α n−1
· · · α 3(n−1)
m
as a parity check matrix for B(2, m). Let y(x) be a polynomial in F2 [x]/(x 2 −1 − 1). In
Section 5.4.1 we defined the syndrome Si of y(x) as y(α i ). By Exercise 295, we have
H yT =
S1
,
S3
(11.9)
m
where y is the vector in F22 −1 associated to y(x). By Theorem 1.11.5 and Exercise 295(b),
there is a one-to-one correspondence between syndromes and cosets. Thus it suffices to
show that for any pair of syndromes S1 and S3 , there is a vector y with wt(y) ≤ 3 that
satisfies (11.9). Let X i be the location numbers of y; see Section 5.4. Thus, by (5.8), we
only need to show that for any S1 and S3 in F2m , there exists X 1 , X 2 , and X 3 in F2m such
that
S1 = X 1 + X 2 + X 3 ,
S3 = X 13 + X 23 + X 33 .
(11.10)
If we let z i = X i + S1 , then by Exercise 639 we obtain the equivalent system
0 = z1 + z2 + z3,
s = z 13 + z 23 + z 33 ,
(11.11)
where s = S13 + S3 . If s = 0, z 1 = z 2 = z 3 = 0 is a solution of (11.11). So assume that
s = 0. Substituting z 3 = z 1 + z 2 from the first equation in (11.11) into the second equation
gives
z 12 z 2 + z 1 z 22 = s.
We will find a solution with z 2 = 0, in which case if we set z = z 1 /z 2 we must solve
s
z 2 + z + 3 = 0,
(11.12)
z2
which is of the form examined in Lemma 11.3.4. Thus by the lemma we must find a z 2 such
that Trm (s/z 23 ) = 0.
Since s = 0, we may assume that s = α 3i+ j , where 0 ≤ j ≤ 2. There are two cases:
r Suppose j = 0. Let z = α i . Then Tr (s/z 3 ) = Tr (1) = 1 + 1 + · · · + 1 (m times). As
2
m
m
2
m is even, Trm (1) = 0 and so we have z 2 such that Trm (s/z 23 ) = 0.
r Suppose j = 1 or 2. Recalling that Tr is a surjective F -linear transformation of F m
m
2
2
onto F2 , there are exactly 2m−1 elements of F2m whose trace is 0. Since m is even,
3 | (2m − 1); thus there are exactly (2m − 1)/3 elements of F2m that are nonzero cubes
by Exercise 640. Hence these counts show that there must be some nonzero element of
F2m which is not a cube but has trace 0; let θ = α 3k+r be such an element where r then
443
11.3 Covering radius of BCH codes
Table 11.3 Covering radius of binary
[n = 2m − 1, k, d] codes B(t, m)
m
n
k
t
d
ρBCH (t, m)
3
7
4
1
3
1
4
4
4
15
15
15
11
7
5
1
2
3
3
5
7
1
3
5
5
5
5
5
5
31
31
31
31
31
26
21
16
11
6
1
2
3
4
6
3
5
7
11
15
1
3
5
7
11
6
6
6
6
6
63
63
63
63
63
57
51
45
39
36
1
2
3
4
5
3
5
7
9
11
1
3
5
7
9
must be 1 or 2. Note that 0 = Trm (θ ) = Tr(θ 2 ); hence if necessary we may replace θ by
θ 2 and thereby assume that j = r . Now let z 2 = α i−k . Then s/z 23 = θ , which has trace 0
as needed. Thus we again have z 2 such that Trm (s/z 23 ) = 0.
Therefore in each case we have produced a nonzero z 2 ∈ F2m such that there is a solution z
of (11.12). Then z 1 = z 2 z and z 3 = z 1 + z 2 give the solution of (11.11) as required.
Exercise 639 Show that (11.10) and (11.11) are equivalent systems. Hint: Recall that
S2 = S12 = X 12 + X 22 + X 32 .
Exercise 640 Show that if m is even, 3 | (2m − 1) and there are exactly (2m − 1)/3 elements
of F2m that are nonzero cubes. Hint: F∗2m = F2m \ {0} is a cyclic group of order divisible
by 3.
One would hope to try a similar method for the triple error-correcting BCH codes. Assmus
and Mattson [7] showed that B(3, m)⊥ contains five nonzero weights when m is odd. So
the Delsarte Bound and Theorem 11.3.1 combine to show that ρBCH (3, m) = 5 for m odd.
When m is even, B(3, m)⊥ contains seven nonzero weights. The case m even was completed
by van der Horst and Berger [138] and by Helleseth [121] yielding the following theorem.
Theorem 11.3.6 If m ≥ 4, then ρBCH (3, m) = 5.
A computer calculation was done in [71, 72] that produced the covering radii of primitive narrow-sense binary BCH codes of lengths up to 63. The results are summarized in
Table 11.3 and are also found in [34]. Note that each covering radius in the table satisfies
ρBCH (t, m) = 2t − 1, which is the lower bound obtained in Theorem 11.3.1. The following
theorem, proved by Vlăduţ and Skorobogatov [340], shows that for very long BCH codes
the covering radius of B(t, m) is also equal to the lower bound 2t − 1.
444
Covering radius and cosets
Theorem 11.3.7 There is a constant M such that for all m ≥ M, ρBCH (t, m) = 2t − 1.
11.4
Covering radius of self-dual codes
In Chapters 9 and 10 we examined some well-known binary self-dual codes of moderate
lengths. In this section we want to compute the covering radius of some of these codes.
We begin by considering extremal Type II binary codes. In Example 1.11.7, we presented
the weight distribution of each coset of the unique [8, 4, 4] Type II binary code and conclude
that it has covering radius 2. This same result comes from Theorem 11.1.7. In a series of
examples and exercises in Section 7.5, we gave the complete coset weight distribution of
each of the two [16, 8, 4] Type II binary codes and concluded that the covering radius is 4
for each code. Exercise 641 gives another proof without resorting to specific knowledge of
the generator matrix of the code.
Exercise 641 Let C be a [16, 8, 4] Type II binary code.
(a) Use the Delsarte Bound to show that ρ(C) ≤ 4.
(b) Use Theorems 11.1.1 and 11.1.7 to show that 3 ≤ ρ(C).
(c) Suppose that C has covering radius 3. By Exercise 625, if C ∗ is obtained from C by
puncturing one coordinate, then ρ(C ∗ ) = 2. Show that this violates the Sphere Covering
Bound.
By Theorem 10.1.7 (or Example 8.3.2), we know that the unique [24, 12, 8] Type II
binary code has covering radius 4. As this code is extended perfect, Theorem 11.1.7 also
shows this. We now turn to the [32, 16, 8] Type II binary codes. In Section 9.7 we noted
that there are exactly five such codes. Without knowing the specific structure of any of
these except the number of weight 8 vectors, we can find their covering radii, a result found
in [9].
Theorem 11.4.1 Let C be a [32, 16, 8] Type II code. Then ρ(C) = 6.
Proof: We remark that the weight distribution of C is determined uniquely; see Theorem 9.3.1. It turns out that A8 (C) = 620. C = C ⊥ contains codewords of weights 8, 12,
16, 20, 24, and 32. So by the Delsarte Bound ρ(C) ≤ 6. By Theorems 11.1.1 and 11.1.7,
5 ≤ ρ(C) as C is not extended perfect (Exercise 642). Suppose that ρ(C) = 5. This implies
that every weight 6 vector in F32
2 is in a coset of even weight less than 6. The only coset of
weight 0 is the code itself, which does not contain weight 6 codewords. Let x be a vector of
weight 6. Then there must be a codeword c such that wt(x + c) = 2 or 4. The only possibility is that wt(c) = 8. Thus every weight 6 vector has support intersecting the support of
one of the 620 codewords of weight 8 in C in either five or six coordinates. The number of
weight 6 vectors whose support intersects the support of a weight 8 vector in this manner
)( 85 ) + ( 86 ) = 1372. Thus there are at most 620 · 1372 = 850 640 weight 6 vectors in
is ( 24
1
32
) = 906 192 weight 6 vectors in F32
F2 . But as there are ( 32
2 , we have a contradiction.
6
Exercise 642 Use the Sphere Packing Bound to show that a [32, 16, 8] even code is
not extended perfect. Remark: We know this if we use the fact that the only perfect
445
11.4 Covering radius of self-dual codes
Table 11.4 Coset distribution of C = R(2, 5), where Ai (v + C) = An−i (v + C)
Coset
weight 0 1 2 3 4 5 6 7
0
1
2
3
4
4
4
4
4
5
6
1
Number of vectors of given weight
8
9
10
11
620
1
153
1
35
1
7
1
2
3
4
5
240
28
24
20
16
12
6
5 208
2 193
892
322
324
326
328
330
106
32
14
15
6 720
1 964
1 976
1 988
2 000
2 012
18 251
14 155
10 507
6 895
6 878
6 861
6 844
6 827
3 934
1 952
18 848
17 328
14 392
14 384
14 376
14 368
14 360
10 620
6 912
16
36 518
8 680
3 949
850
320
13
13 888
467
84
12
18 322
18 360
18 388
18 416
18 444
17 252
14 400
18 304
Number
of cosets
1
32
496
4 960
11 160
6 200
2 480
930
248
27 776
11 253
multiple error-correcting codes are the Golay codes from Theorem 1.12.3, but we do not
need to use this powerful theorem to verify that a [32, 16, 8] even code is not extended
perfect.
Exercise 643 In this exercise, you will find part of the coset weight distributions of cosets
of a [32, 16, 8] Type II code C.
(a) The weight distribution of C is A0 = A32 = 1, A8 = A24 = 620, A12 = A20 = 13 888,
and A16 = 36 518. Verify this by solving the first five Pless power moments (P1 ); see
also (7.8).
(b) By Theorem 9.3.10, the codewords of weights 8, 12, and 16 hold 3-designs. Find the
Pascal triangle for each of these 3-designs.
(c) In Table 11.4, we give the complete coset weight distribution of the [32, 16, 8] Type
II code R(2, 5), where blank entries are 0. For cosets of weights 1, 2, and 3, verify
that any [32, 16, 8] Type II code has the same coset weight distribution and number of
cosets as R(2, 5) does. Hint: See Example 8.4.6.
(d) Repeat part (c) for the cosets of weight 5.
(e) For cosets of weight 6, verify that any [32, 16, 8] Type II code has the same coset weight
distribution as R(2, 5) does. Note that the number of cosets of weight 6 depends on the
code.
By this exercise, the complete coset weight distribution of each of the five [32, 16, 8] Type II
codes is known once the number and distribution of the weight 4 cosets is known, from
which the number of weight 6 cosets can be computed. These have been computed in [45]
and are different for each code.
We summarize, in Table 11.5, what is known [9] about the covering radii of the extremal Type II binary codes of length n ≤ 64. An argument similar to that in the proof of
Theorem 11.4.1 gives the upper and lower bounds for n = 40, 56, and 64. We remark that
there exists an extremal Type II code of length 40 with covering radius 7 and another with
covering radius 8; see [34]. The exact value for the covering radius of an extremal Type II
code of length 48 is 8; this result, found in [9], uses the fact that the weight 12 codewords
of such a code hold a 5-design by Theorem 9.3.10. The only [48, 24, 12] Type II code is
446
Covering radius and cosets
Table 11.5 Covering radii ρ of extremal
Type II binary codes of length n ≤ 64
n
k
d
ρ
8
4
4
2
16
8
4
4
24
12
8
4
32
16
8
6
40
20
8
6–8
48
24
12
8
56
28
12
8–10
64
32
12
8–12
the extended quadratic residue code; see Research Problem 9.3.7. Part of the coset weight
distribution of the extended quadratic residue code is given in [62]; from this the complete
coset distribution can be computed.
There are many similarities between the coset distributions of the length 32 Type II
codes discussed in Exercise 643 and those of length 48. For example, the cosets of weights
1 through 5 and 7 in the length 48 codes have uniquely determined coset distributions
and uniquely determined numbers of cosets. The cosets of weight 8 also have uniquely
determined distributions but not number of cosets of weight 8.
Exercise 644 Verify the bounds given in Table 11.5 for n = 40, 56, and 64. When n = 40,
the number of weight 8 codewords in an extremal Type II code is 285. When n = 56, the
number of weight 12 codewords in an extremal Type II code is 8190. When n = 64, the
number of weight 12 codewords in an extremal Type II code is 2976.
Near the end of Section 9.4 we defined the child of a self-dual code. Recall that if C is
∗
an [n, n/2, d] self-dual binary code with d ≥ 3, then a child C ′ of C is obtained by fixing
two coordinates, taking the subcode C ′ that is 00 or 11 on those two coordinates, and then
puncturing C ′ on those fixed coordinates. This produces an [n − 2, (n − 2)/2, d ′ ∗ ] self-dual
code with d ′ ∗ ≥ d − 2. The child of a self-dual code can have a larger covering radius than
the parent as the following result shows.
∗
Theorem 11.4.2 Let C be an [n, n/2, d] self-dual code with d ≥ 3. Suppose that C ′ is a
∗
child of C. Then ρ(C ′ ) ≥ d − 1.
∗
Proof: Suppose that the two fixed coordinates which are used to produce C ′ are the two
left-most coordinates. Let C ′ be as above. Choose x ∈ C to have minimum nonzero weight
among all codewords of C which are 10 or 01 on the first two coordinates. So x = 10y or
01y. As adding a codeword c′ of C ′ to x produces a codeword c = c′ + x of C with first two
coordinates 01 or 10, wt(c) ≥ wt(x) by the choice of x. This means that y must be a coset
∗
leader of the coset y + C ′ . Since wt(y) ≥ d − 1, the result follows.
A child of the [24, 12, 8] extended binary Golay code is a [22, 11, 6] extremal Type I
code discussed in Example 9.4.12. Its covering radius is three more than that of its parent
as Exercise 645 shows.
∗
Exercise 645 Let C be the [24, 12, 8] extended binary Golay code and C ′ its child. Prove
∗
that ρ(C ′ ) = 7.
447
11.5 The length function
11.5
The length function
In this section we examine the length function ℓq (m, r ) for 1 ≤ r ≤ m, where ℓq (m, r )
is the smallest length of any linear code over Fq of redundancy m and covering radius
r . Closely related to the length function is the function tq (n, k), defined as the smallest
covering radius of any [n, k] code over Fq . When q = 2, we will denote these functions by
ℓ(m, r ) and t(n, k) respectively. Knowing one function will give the other, as we shall see
shortly. Finding values of these functions and codes that realize these values is fundamental
to finding codes with good covering properties. We first give some elementary facts about
these two functions.
Theorem 11.5.1 The following hold:
(i)
tq (n + 1, k + 1) ≤ tq (n, k).
(ii) ℓq (m, r + 1) ≤ ℓq (m, r ).
(iii) ℓq (m, r ) ≤ ℓq (m + 1, r ) − 1.
(iv) ℓq (m + 1, r + 1) ≤ ℓq (m, r ) + 1.
(v) If ℓq (m, r ) ≤ n, then tq (n, n − m) ≤ r .
(vi) If tq (n, k) ≤ r, then ℓq (n − k, r ) ≤ n.
(vii) ℓq (m, r ) equals the smallest integer n such that tq (n, n − m) ≤ r .
(viii) tq (n, k) equals the smallest integer r such that ℓq (n − k, r ) ≤ n.
Proof: We leave the proofs of (i), (vi), and (vii) as an exercise.
For (ii), let n = ℓq (m, r ). Then there exists an [n, n − m] code with covering radius
r , which has parity check matrix H = [Im A]. Among all such parity check matrices for
[n, n − m] codes with covering radius r , choose A to be the one with some column wT of
smallest weight. This column cannot be the zero column as otherwise we can puncture the
column and obtain a code of smaller length with redundancy m and covering radius r by
Theorem 11.1.2(ii). Replace some nonzero entry in wT with 0 to form the matrix A1 with wT1
in place of wT ; so w = w1 + α1 u1 where uT1 is one of the columns of Im . Then H1 = [Im A1 ]
is the parity check matrix of an [n, n − m] code C 1 satisfying ρ(C 1 ) = r by the choice of A.
Any syndrome sT can be written as a linear combination of r or fewer columns of H . If wT is
involved in one of these linear combinations, we can replace wT by wT1 − α1 uT1 . This allows
us to write sT as a linear combination of r + 1 or fewer columns of H1 . Thus ρ(C 1 ) ≤ r + 1.
Since ρ(C 1 ) = r , ρ(C 1 ) ≤ r − 1 or ρ(C 1 ) = r + 1. We assume that ρ(C 1 ) ≤ r − 1 and will
obtain a contradiction. If w1 = 0, then every linear combination of columns of H1 is in fact
a linear combination of columns of H implying by Theorem 11.1.2(ii) that ρ(C) ≤ ρ(C 1 ) ≤
r − 1, a contradiction. Therefore replace some nonzero entry in wT1 with 0 to form the matrix
A2 with wT2 in place of wT1 , where w1 = w2 + α2 u2 and uT2 is one of the columns of Im .
Then H2 = [Im A2 ] is the parity check matrix of an [n, n − m] code C 2 . Arguing as above,
ρ(C 2 ) is at most 1 more than ρ(C 1 ) implying ρ(C 2 ) ≤ r . By the choice of A, ρ(C 2 ) = r and
thus ρ(C 2 ) ≤ r − 1. We can continue inductively constructing a sequence of codes C i with
parity check matrices Hi = [Im Ai ], where Ai is obtained from Ai−1 by replacing a nonzero
entry in wi−1 by 0 to form wi . Each code satisfies ρ(C i ) ≤ r − 1. This process continues
448
Covering radius and cosets
until step i = s where wt(ws ) = 1. So this column ws of Hs is a multiple of some column
of Im . But then every linear combination of the columns of Hs is a linear combination of
the columns of H implying by Theorem 11.1.2(ii) that ρ(C) ≤ ρ(C s ) ≤ r − 1, which is a
contradiction. Thus ρ(C 1 ) = r + 1 and therefore ℓq (m, r + 1) ≤ n = ℓq (m, r ) proving (ii).
For part (iii), let H be a parity check matrix for a code of length n = ℓq (m + 1, r ), redundancy m + 1, and covering radius r . By row reduction, we may assume that [1 0 0 · · · 0]T
is the first column of H . Let H ′ be the m × (n − 1) matrix obtained from H by removing
the first row and column of H. The code C ′ , with parity check matrix H ′ , has length n − 1
and redundancy m. Using the first column of H in a linear combination of the columns
of H affects only the top entry; therefore ρ(C ′ ) = r ′ ≤ r . Thus ℓq (m, r ′ ) ≤ n − 1. But
ℓq (m, r ) ≤ ℓq (m, r ′ ) by induction on (ii). Part (iii) follows.
To verify (iv), let C be a code of length n = ℓq (m, r ), redundancy m, and covering radius
r . Let Z be the zero code of length 1, which has covering radius 1. Then C 1 = C ⊕ Z is a
code of length n + 1, redundancy m + 1, and covering radius r + 1 by Theorem 11.1.6(i).
Therefore ℓq (m + 1, r + 1) ≤ n + 1 = ℓq (m, r ) + 1 and (iv) holds.
For part (v), suppose that ℓq (m, r ) = n ′ ≤ n. Then there is an [n ′ , n ′ − m] code with
covering radius r . So tq (n ′ , n ′ − m) ≤ r . Applying (i) repeatedly, we obtain tq (n, n − m) ≤
tq (n ′ , n ′ − m) completing (v).
Suppose that tq (n, k) = r . Then there exists an [n, k] code with covering radius r implying
that ℓq (n − k, r ) ≤ n. Let r ′ be the smallest value of ρ such that ℓq (n − k, ρ) ≤ n. Then
r ′ ≤ r , and there is an [n ′ , n ′ − n + k] code with n ′ ≤ n and covering radius r ′ . Thus
tq (n ′ , n ′ − n + k) ≤ r ′ . By repeated use of (i), tq (n, k) ≤ tq (n ′ , n ′ − n + k) ≤ r ′ implying
r ≤ r ′ . Hence r = r ′ verifying (viii).
Exercise 646 Prove the remainder of Theorem 11.5.1 by doing the following:
(a) Prove Theorem 11.5.1(i). Hint: Suppose that tq (n, k) = r . Thus there is an [n, k] code
C over Fq with covering radius r . Let H be a parity check matrix for C. Choose any
column to adjoin to H . What happens to the covering radius of the code with this new
parity check matrix?
(b) Prove Theorem 11.5.1(vi). Hint: See the proof of Theorem 11.5.1(v).
(c) Prove Theorem 11.5.1(vii). Hint: This follows directly from the definitions.
Exercise 647 Suppose that C is a linear binary code of length ℓ(m, r ) with 1 ≤ r where
C has redundancy m and covering radius r . Show that C has minimum distance at least 3
except when m = r = 1.
There is an upper and lower bound on ℓq (m, r ), as we now see.
Theorem 11.5.2 Let r ≤ m. Then m ≤ ℓq (m, r ) ≤ (q m−r +1 − 1)/(q − 1) + r − 1.
Furthermore:
(i) m = ℓq (m, r ) if and only if r = m, and the only code satisfying this condition is the
zero code, and
(ii) ℓq (m, 1) = (q m − 1)/(q − 1), and the only code satisfying this condition is the
Hamming code Hq,m .
449
11.5 The length function
Proof: As the length of a code must equal or exceed its redundancy, m ≤ ℓq (m, r ), giving
the lower bound on ℓq (m, r ). The length of a code equals its redundancy if and only if the
code is the zero code. But the zero code of length m has covering radius m. Any code of
length m and covering radius m must have redundancy m by the Redundancy Bound. This
completes (i).
We next prove the upper bound on ℓq (m, r ). Let C be the direct sum of the Hamming
code Hq,m−r +1 of length (q m−r +1 − 1)/(q − 1), which has covering radius 1, and the zero
code of length r − 1, which has covering radius r − 1. By Theorem 11.1.6(i), ρ(C) = r ,
verifying the upper bound.
This upper bound shows that ℓq (m, 1) ≤ (q m − 1)/(q − 1). Since the covering radius is
1, a code meeting the bound must have a nonzero scalar multiple of every nonzero syndrome
in its parity check matrix. As the length is (q m − 1)/(q − 1), the parity check matrix cannot
have two columns that are scalar multiples of each other, implying (ii).
Notice that in the proof, the upper bound on ℓq (m, r ) was obtained by constructing a
code of redundancy m and covering radius r . This is analogous to the way upper bounds
were obtained for Aq (n, d) and Bq (n, d) in Chapter 2.
Now suppose that we know all values of tq (n, k). From this we can determine all values of
ℓq (m, r ) as follows. Choose r ≤ m. By Theorem 11.5.1(vii), we need to find the smallest n
such that tq (n, n − m) ≤ r . By Theorem 11.5.2, there is an upper bound n ′ on ℓq (m, r ); also
by Theorem 11.5.1(i), tq (n, n − m) ≤ tq (n − 1, (n − 1) − m) implying that as n decreases,
tq (n, n − m) increases. Therefore starting with n = n ′ and decreasing n in steps of 1, find
the first value of n where tq (n, n − m) > r ; then ℓq (m, r ) = n + 1.
Exercise 648 Do the following (where q = 2):
(a) Using Theorem 11.5.2, give upper bounds on ℓ(5, r ) when 1 ≤ r ≤ 5.
(b) What are ℓ(5, 1) and ℓ(5, 5)?
(c) The following values of t(n, n − 5) are known: t(15, 10) = t(14, 9) = · · · = t(9, 4) =
2, t(8, 3) = t(7, 2) = t(6, 1) = 3, and t(5, 0) = 5. Use these values to compute ℓ(5, r )
for 2 ≤ r ≤ 4. You can check your answers in Table 11.6.
(d) For 2 ≤ r ≤ 4, find parity check matrices for the codes that have length ℓ(5, r ), redundancy 5, and covering radius r .
Conversely, fix m and suppose we know ℓq (m, r ) for 1 ≤ r ≤ m, which by Theorem 11.5.1(ii) is a decreasing sequence as r increases. Then the smallest integer r such
that ℓq (m, r ) ≤ m + j is the value of tq (m + j, j) for 1 ≤ j by Theorem 11.5.1(viii).
Exercise 649 The following values of ℓ(7, r ) are known:
r
ℓ(7, r )
1
2
3
4
5
6
7
127 19
11
8
8
8
7
Using this information, compute t(7 + j, j) for 1 ≤ j ≤ 20.
We now turn our focus to the length function for binary codes. We accumulate some
results about ℓ(m, r ) = ℓ2 (m, r ), most of which we do not prove.
450
Covering radius and cosets
Table 11.6 Bounds on ℓ(m, r )
m/r
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2
3
4
5
6
7
8
9
10
2
4
5
9
13
19
25–26
34–39
47–53
65–79
92–107
129–159
182–215
257–319
363–431
513–639
725–863
1025–1279
1449–1727
2049–2559
2897–3455
4097–5119
5794–6911
3
5
6
7
11
14
17–18
21–22
23
31–37
38–53
47–63
60–75
75–95
93–126
117–153
148–205
187–255
235–308
295–383
371–511
467–618
4
6
7
8
9
13
16
17–20
19–23
23–25
27–29
32–37
37–49
44–62
53–77
62–84
73–93
86–125
103–150
122–174
144–190
5
7
8
9
10
11
15
18
19
21–24
23–27
27–31
30–35
34–41
39–47
44–59
51–75
57–88
65–98
76–107
6
8
9
10
11
12
13
17
20
21
22–25
24–29
27–33
30–36
33–40
37–44
41–45
46–59
51–73
7
9
10
11
12
13
14
15
19
22
23
24–27
25–30
28–31
30–38
33–41
37–45
40–47
8
10
11
12
13
14
15
16
17
21
24
25
26–29
27–32
29–33
31–37
34–41
9
11
12
13
14
15
16
17
18
19
23
26
27
28–31
29–34
30–35
10
12
13
14
15
16
17
18
19
20
21
25
28
29
30–33
Theorem 11.5.3 The following hold:
(i) ℓ(m, r ) = m + 1 if ⌈m/2⌉ ≤ r < m.
(ii) ℓ(2s + 1, s) = 2s + 5 if s ≥ 1.
(iii) ℓ(2s, s − 1) = 2s + 6 if s ≥ 4.
(iv) ℓ(2s + 1, s − 1) = 2s + 7 if s ≥ 6.
(v) ℓ(2s − 1, 2) ≥ 2s + 1 if s ≥ 3.
Proof: We only prove (i). Parts (ii), (iii), and (iv) are proved in [34]; they were first proved
in [35, 41]. Part (v) is found in [322].
The binary repetition code of length m + 1 has covering radius ⌈m/2⌉ by Exercise 617.
As it is of redundancy m, ℓ(m, ⌈m/2⌉) ≤ m + 1. If ⌈m/2⌉ ≤ r < m, then ℓ(m, r ) ≤
ℓ(m, ⌈m/2⌉) by Theorem 11.5.1(ii). By Theorem 11.5.2, m + 1 ≤ ℓ(m, r ). Combining these
inequalities yields (i).
Table 11.6, found in [207], gives upper and lower bounds on ℓ(m, r ) for 2 ≤ r ≤
min{m, 10} with m ≤ 24. By Theorem 11.5.2, ℓ(m, 1) = 2m − 1 and these values are excluded from the table. References for the bounds can be found in [207]. The upper bounds
are obtained by constructing codes with given covering radius and redundancy; the lower
bounds are obtained by various results including those of this section.
451
11.5 The length function
Exercise 650 Using the theorems of this section, verify the entries in Table 11.6 for m ≤ 8
excluding ℓ(6, 2), ℓ(7, 2), and ℓ(8, 2).
Exercise 651 Verify the lower bounds in Table 11.6 for ℓ(m, 2) when m ≥ 11 and m is
odd.
Exercise 652 Do the following:
(a) Use the Sphere Covering Bound to show that
/
0
r
n
m
min
≤ ℓ(m, r ).
≥2
n
i
i=0
(b) Use part (a) to show that 23 ≤ ℓ(11, 3).
(c) The existence of what code guarantees that ℓ(11, 3) = 23?
We now give one construction, called the ADS construction, of binary codes that assists
in providing many of the upper bounds in Table 11.6; for other constructions, see [34].
Before we give the construction, we must discuss the concept of normal codes and
acceptable coordinates. Recall that if S ⊂ Fqn and x ∈ Fqn , then d(x, S) is the minimum
distance from x to any vector in S; in the case that S is empty, set d(x, ∅) = n. Let C be
a binary linear code of length n. For a ∈ F2 and 1 ≤ i ≤ n, let C a,(i) denote the subset of
C consisting of the codewords in C whose ith coordinate equals a. Note that C 0,(i) either
equals C, in which case C 1,(i) = ∅, or C 0,(i) has codimension 1 in C, in which case C 1,(i) is
the coset C \ C 0,(i) . For 1 ≤ i ≤ n, the norm of C with respect to coordinate i is defined by
N (i) (C) = maxn d x, C 0,(i) + d x, C 1,(i) .
x∈F2
The norm of C is defined to be
N (C) = min N (i) (C) .
1≤i≤n
Coordinate i is called acceptable if N (i) (C) = N (C). One relationship between the norm
and covering radius of a code is given by the following lemma.
Lemma 11.5.4 Let C be a binary linear code. Then
N (C)
ρ(C) ≤
.
2
Proof: Suppose that C has length n. For x ∈ Fqn and 1 ≤ i ≤ n,
1 ,
d(x, C) = min d x, C 0,(i) , d x, C 1,(i) ≤ d x, C 0,(i) + d x, C 1,(i) .
2
Therefore
1
1,
N (i) (C)
0,(i)
1,(i)
d x, C
.
ρ(C) = maxn d(x, C) ≤ maxn
+ d x, C
=
2
x∈F2
x∈F2 2
For some i, the latter equals (N (C))/2. The result follows.
452
Covering radius and cosets
So, 2ρ(C) ≤ N (C). We attach a specific name to codes with norms as close to this lower
bound as possible. The code C is normal provided N (C) ≤ 2ρ(C) + 1. Thus for normal
codes, N (C) = 2ρ(C) or N (C) = 2ρ(C) + 1. Interestingly to this point no one has found a
binary linear code that is not normal. (There are binary nonlinear codes that are not normal
[136].)
We are now ready to present the ADS construction of a binary code. As motivation, recall
that the direct sum of two binary codes yields a code with covering radius the sum of the
covering radii of the two component codes by Theorem 11.1.6(i). In certain circumstances,
from these component codes, we can construct a code of length one less than that of the
direct sum but with covering radius the sum of the covering radii of the two codes. Thus
we “save” a coordinate. Let C 1 and C 2 be binary [n 1 , k1 ] and [n 2 , k2 ] codes, respectively.
The amalgamated direct sum, or ADS, of C 1 and C 2 (with respect to the last coordinate of
C 1 and the first coordinate of C 2 ) is the code C of length n 1 + n 2 − 1, where
C = (a, 0, b) | (a, 0) ∈ C 10,(n 1 ) , (0, b) ∈ C 0,(1)
2
∪ (a, 1, b) | (a, 1) ∈ C 11,(n 1 ) , (1, b) ∈ C 1,(1)
.
2
As long as either C 1 is not identically zero in coordinate n 1 or C 2 is not identically zero in
coordinate 1, C is an [n 1 + n 2 − 1, k1 + k2 − 1] code.
Exercise 653 Do the following:
(a) Let C 1 and C 2 be binary [n 1 , k1 ] and [n 2 , k2 ] codes, with parity check matrices H1 =
[H1′ h′ ] and H2 = [h′′ H2′′ ], respectively, where h′ and h′′ are column vectors. Let C be
the ADS of C 1 and C 2 , where we assume that C 1 is not identically zero in coordinate n 1
or C 2 is not identically zero in coordinate 1. Show that the parity check matrix of C is
H1′
O
h′
h′′
O
.
H2′′
(b) What happens in the ADS of C 1 and C 2 when C 1 is identically zero in coordinate n 1
and C 2 is identically zero in coordinate 1?
We now show that the ADS construction yields a code with covering radius at most the
sum of the covering radii of component codes provided these codes are normal. Notice that
the redundancy of the ADS of C 1 and C 2 is the same as the redundancy of C 1 ⊕ C 2 . If they
have the same covering radius, we have created a shorter code of the same redundancy and
covering radius, which is precisely what the length function considers. The result is found
in [52, 111].
Theorem 11.5.5 Assume that C 1 and C 2 are normal codes of lengths n 1 and n 2 , respectively,
with the last coordinate of C 1 and the first coordinate of C 2 both acceptable. In addition
1)
1)
, C 1,(n
, C 20,(1) , and C 1,(1)
is nonempty. Let C be the ADS
assume that each of the sets C 0,(n
1
1
2
of C 1 and C 2 with respect to the last coordinate of C 1 and the first coordinate of C 2 . Then
ρ(C) ≤ ρ(C 1 ) + ρ(C 2 ).
453
11.5 The length function
Proof: Let z = (x, 0, y) ∈ Fn2 1 +n 2 −1 . Clearly,
d z, C 0,(n 1 ) = d (x, 0), C 10,(n 1 ) + d (0, y), C 20,(1) , and
d z, C 1,(n 1 ) = d (x, 0), C 11,(n 1 ) + d (0, y), C 21,(1) − 1.
Therefore adding these two equations and using the definitions
d z, C 0,(n 1 ) + d z, C 1,(n 1 ) ≤ N (n 1 ) (C 1 ) + N (1) (C 2 ) − 1.
(11.13)
This also holds for z = (x, 1, y) ∈ Fn2 1 +n 2 −1 . As (11.13) is true for all z ∈ Fn2 1 +n 2 −1 ,
N (n 1 ) (C) ≤ N (n 1 ) (C 1 ) + N (1) (C 2 ) − 1.
Using Lemma 11.5.4 and the normality assumptions for C 1 and C 2 , we see that
2ρ(C) ≤ N (C) ≤ N (C 1 ) + N (C 2 ) − 1 ≤ 2ρ(C 1 ) + 2ρ(C 2 ) + 1,
proving the theorem.
Exercise 654 Let C be a binary code of length n. Prove the following:
(a) If PAut(C) is transitive, then N (i) (C) = N ( j) (C) for 1 ≤ i < j ≤ n. Furthermore all
coordinates are acceptable.
(b) If c ∈ C and x ∈ Fn2 , then
d x + c, C 0,(i) + d x + c, C 1,(i) = d x, C 0,(i) + d x, C 1,(i) .
(c) The [7, 4, 3] binary Hamming code H3 is normal and every coordinate is acceptable.
Hint: By part (a) you only have to compute N (1) (H3 ) as H3 is cyclic. By part (b) you
)+
only have to check coset representatives x of H3 in F72 when calculating d(x, H0,(1)
3
d(x, H31,(1) ).
(d) The [3, 1, 3] binary repetition code is normal and every coordinate is acceptable.
Example 11.5.6 Let C 1 = C 2 = H3 and let C be the ADS of C 1 and C 2 . C is a [13, 7]
code. As H3 is perfect, ρ(H3 ) = 1. By Exercise 654, H3 is normal. Therefore by Theorem 11.5.5, ρ(C) ≤ 2. If ρ(C) = 1, then every syndrome in F62 must be a combination of
one column of a parity check matrix of C, which is clearly impossible. Hence ρ(C) = 2,
implying that ℓ(6, 2) ≤ 13. Applying Exercise 652(a) shows that 11 ≤ ℓ(6, 2). If there is an
[11, 5] binary code with covering radius 2, there is a [12, 6] binary code also with covering
radius 2 obtained by taking the direct sum of the [11, 5] code with the repetition code of
length 1. It can be shown that there is no [12, 6] binary code with covering radius 2; see
[34, Theorem 8.8]. This verifies that ℓ(6, 2) = 13.
We conclude this section with a result, found in [41], that can be used to compute certain
values of ℓ(m, 2); this result is important in showing that there is no [12, 6] binary code
with covering radius 2 as discussed in the preceding example.
Theorem 11.5.7 If C is an [n, n − m] binary code with covering radius 2 and w is the
weight of a nonzero vector in C ⊥ , then w(n + 1 − w) ≥ 2m−1 .
454
Covering radius and cosets
Proof: Let x ∈ C ⊥ with wt(x) = w. By permuting coordinates we may assume that C has
the following parity check matrix, where x is the first row:
H=
1
···
H1
1 0
···
H2
0
.
Consider a syndrome sT with 1 as the top entry. There are 2m−1 such syndromes. Either sT is
one of the first w columns of H , or it is a sum of one of the first w columns of H and one of
the last n − w columns of H since ρ(C) = 2. Therefore w + w(n − w) = w(n + 1 − w) ≥
2m−1 .
Exercise 655 In this exercise you will verify that ℓ(5, 2) = 9 as indicated in Table 11.6 in
a different manner from Exercise 648.
(a) Show that ℓ(5, 2) ≤ 9 by showing that the ADS of the [7, 4, 3] Hamming code and the
[3, 1, 3] repetition code is a [9, 4] code with covering radius 2. Hint: See Exercise 654
and Example 11.5.6.
(b) Show that 8 ≤ ℓ(5, 2) using Exercise 652(a).
(c) Let C be an [8, 3] binary code with covering radius 2.
(i) Show that C ⊥ is an [8, 5, d] code with d ≥ 3. Hint: Use Theorem 11.5.7.
(ii) Show that there does not exist an [8, 5, d] binary code with d ≥ 3. Note that this
shows that ℓ(5, 2) = 9.
11.6
Covering radius of subcodes
The relationship between the covering radius of a code and that of a subcode can be quite
complex. Recall by Exercise 618 that if C 0 ⊆ C, then ρ(C) ≤ ρ(C 0 ). However, it is not clear
how large the covering radius of ρ(C 0 ) can be compared with ρ(C). The next two examples
give extremes for these possibilities.
Example 11.6.1 In this example we show how the covering radius of a subcode C 0 could
be as small as the covering radius of the original code. Let H1 be a parity check matrix of a
binary code with covering radius ρ, and let C be the code with parity check matrix given by
H = [ H1
H1 ].
Clearly, ρ(C) = ρ as the list of columns of H is the same as the list of columns of H1 . Let
C 0 be the subcode of codimension 1 in C with parity check matrix
H1
H1
.
0 ··· 0 1 ··· 1
We also have ρ(C 0 ) = ρ, as Exercise 656 shows.
Exercise 656 Show that the codes with parity check matrices
[H1
H1 ] and
H1
0 ··· 0 1
have the same covering radius.
H1
··· 1
455
11.6 Covering radius of subcodes
Example 11.6.2 In this example we find a subcode C 0 of codimension 1 of a code C where
ρ(C 0 ) = 2ρ(C) + 1. In the next theorem we will see that this is as large as possible.
Let m ≥ 2 be an integer and let Hm be the m × (2m − 1) parity check matrix of the
m
[2 − 1, 2m − 1 − m, 3] binary Hamming code Hm . Let Hm′ be the m × (2m − 2) matrix
obtained from Hm by deleting its column of all 1s. Let r be a positive integer. Then the
r m × (r (2m − 2) + 1) matrix
1 Hm′
O ··· O
1
O Hm′ · · · O
H = .
..
..
..
..
..
.
.
.
.
1
O
O · · · Hm′
is a parity check matrix of a code C (which in fact is an ADS of r Hamming codes) of length
r (2m − 2) + 1 with ρ(C) = r by Exercise 657. The matrix
H0 =
H
0 ···
1
0
is a parity check matrix of a subcode C 0 of codimension 1 in C with covering radius 2r + 1
also by Exercise 657.
Exercise 657 With the notation of Example 11.6.2 do the following:
(a) Prove that ρ(C) = r .
(b) Prove that the syndrome of weight 1 with a 1 in its last position is not the sum of 2r or
fewer columns of H0 .
(c) Prove that ρ(C 0 ) = 2r + 1.
In these last two examples we have produced codes C and subcodes C 0 of codimension
1 where ρ(C 0 ) is as small as possible (namely ρ(C)) and also as large as possible (namely
2ρ(C) + 1) as the following theorem found in [1] shows.
Theorem 11.6.3 (Adams Bound) Let C 0 be a subcode of codimension 1 in a binary linear
code C. Then
ρ(C 0 ) ≤ 2ρ(C) + 1.
(11.14)
Proof: Suppose that C is an [n, n − m] code. Let H be an m × n parity check matrix for
C. We can add one row v = v1 · · · vn to H to form an (m + 1) × n parity check matrix
H
v
H0 =
(11.15)
j
for C 0 . For j = 1, 2, . . . , n, let the columns of H0 and H be denoted h 0 and h j, respectively.
Let ρ = ρ(C) and ρ0 = ρ(C 0 ). Assume (11.14) is false; so ρ0 ≥ 2ρ + 2. There is a
syndrome s0 ∈ F2m+1 of C 0 which is the sum of ρ0 columns of H0 but no fewer. Rearrange
the columns of H0 so that
ρ0
sT0 =
j
h0 .
j=1
(11.16)
456
Covering radius and cosets
Partition {1, 2, . . . , ρ0 } into two disjoint sets J1 and J2 of cardinality at least ρ + 1; this can
be done since ρ0 ≥ 2ρ + 2. Hence (11.16) becomes
sT0 =
j
j
j∈J1
h0 +
(11.17)
h0 .
j∈J2
As j∈J1 h j and j∈J2 h j are syndromes of C and ρ(C) = ρ, there exist sets I1 and I2 of
cardinality at most ρ such that
j∈I1
hj =
h j and
j∈J1
j∈I2
hj =
h j.
(11.18)
j∈J2
j
j
Suppose that
j∈I1 v j =
j∈J1 v j ; then
j∈J1 h 0 =
j∈I1 h 0 , and we can replace
j
j
T
j∈J1 h 0 in (11.17) by
j∈I1 h 0 . Therefore we have expressed s0 as a linear combination of fewer than ρ0 columns of H0 , which is a contradiction. A similar contradic
possibility is
v =
v
tion arises if
j∈I2 v j =
j∈J2 v j . The remaining
j∈I1 j j∈J1 j
and
v
=
v
.
However,
then,
v
+
v
=
v
+
v
j
j
j
j
j
j∈I2
j∈J2
j∈J1
j∈J2 j ;
j∈I1 j j∈I2 j
j
j
hence by (11.18)
+
h
=
h
+
h
,
and
again
we
h
have
exj∈I1 0
j∈I2 0
j∈J1 0
j∈J2 0
pressed sT0 as a linear combination of fewer than ρ0 columns of H0 , a contradiction.
Exercise 658 Let C be the [7, 4, 3] binary Hamming code H3 . Let C 0 be the [7, 3, 4]
subcode of even weight codewords of C.
(a) Give a parity check matrix for C 0 as in (11.15).
(b) Using the parity check matrix and Theorem 11.6.3, find the covering radius of C 0 .
This result is generalized in [37, 311]; the proof is similar but more technical.
Theorem 11.6.4 Let C 0 be a subcode of codimension i in a binary linear code C. Then
ρ(C 0 ) ≤ (i + 1)ρ(C) + 1.
Furthermore, if C is even, then
ρ(C 0 ) ≤ (i + 1)ρ(C).
Another upper bound on the covering radius of a subcode in terms of the covering
radius of the original code, due to Calderbank [43], is sometimes an improvement of Theorem 11.6.4. Before stating this result, we need some additional terminology. A graph G is a
pair consisting of a set V , called vertices, and a set E of unordered pairs of vertices, called
edges. A coloring of G with a set C of colors is a map f : V → C such that f (x) = f (y)
whenever {x, y} is an edge; that is, two vertices on the same edge do not have the same color,
where the color of a vertex x is f (x). The chromatic number of G is the smallest integer
χ(G) for which G has a coloring using a set C containing χ(G) colors. We are interested
in a specific graph called the Kneser graph K (n, r + 1). The vertex set of K (n, r + 1) is
{v ∈ Fn2 | wt(v) = r + 1}, with vertices u and v forming an edge if and only if their supports
are disjoint. The chromatic number of K (n, r + 1) has been determined by Lovász [208]:
χ (K (n, r + 1)) = n − 2r
if n ≥ 2r + 2.
The Calderbank Bound is as follows.
(11.19)
457
Y
L
11.6 Covering radius of subcodes
F
Theorem 11.6.5 (Calderbank Bound) Let C 0 be a subcode of codimension i ≥ 1 in a
binary linear code C of length n. Then
T
m
a
e
ρ(C 0 ) ≤ 2ρ(C) + 2i − 1.
(11.20)
Proof: Let H be a parity check matrix for C. We can add i rows to H to form a parity check
matrix
H
H1
H0 =
j
for C 0 . Let the jth column of H0 be h 0 for 1 ≤ j ≤ n.
Let ρ = ρ(C) and ρ0 = ρ(C 0 ). If ρ0 ≤ 2ρ + 1, then (11.20) clearly holds. So assume that
ρ0 ≥ 2ρ + 2. There must be some syndrome sT0 of C 0 which is a sum of ρ0 columns of H0
but no fewer. By rearranging the columns of H0 , we may assume that
ρ0
sT0 =
j
h0 .
(11.21)
j=1
Form the Kneser graph, K (ρ0 , ρ + 1). This graph has as vertex set V the set of vectors
ρ
ρ
in F2 0 of weight ρ + 1. We write each vector v ∈ Fn2 as (v′ , v′′ ), where v′ ∈ F2 0 . We wish to
establish a coloring f of the vertices of K (ρ0 , ρ + 1). For every v′ ∈ V , there is an xv′ ∈ C
of distance at most ρ from (v′ , 0) as ρ(C) = ρ. For each v′ ∈ V choose one such xv′ ∈ C
and define
f (v′ ) = H1 xTv′ .
Notice that f (v′ ) ∈ Fi2 .
We first establish that H1 xTv′ = 0. Suppose that H1 xTv′ = 0. Then H0 xTv′ = 0 as H xTv′ = 0
because xv′ ∈ C. Therefore,
H0 (v′ , 0)T = H0 ((v′ , 0) − xv′ )T .
(11.22)
Because wt(v′ ) = ρ + 1, the left-hand side of (11.22) is a combination of ρ + 1 columns
of H0 from among the first ρ0 columns of H0 , while the right-hand side of (11.22) is a
combination of at most ρ columns of H0 as wt((v′ , 0) − xv′ ) ≤ ρ. Thus on the right-hand
j
side of (11.21) we may replace ρ + 1 of the columns h 0 by at most ρ columns of H0
indicating that sT0 is a linear combination of at most ρ0 − 1 columns of H0 , which is a
contradiction. This shows that f (v′ ) is one of the 2i − 1 nonzero elements of Fi2 .
We now establish that f is indeed a coloring of K (ρ0 , ρ + 1). Let {v′ , w′ } be an edge
of K (ρ0 , ρ + 1). We must show that H1 xTv′ = H1 xTw′ . Suppose that H1 xTv′ = H1 xTw′ . Then
as v′ and w′ are in V , where supp(v′ ) and supp(w′ ) are disjoint, wt(v′ + w′ ) = 2ρ + 2.
Additionally, the distance between (v′ + w′ , 0) and xv′ + xw′ is at most 2ρ. Since H1 (xv′ +
xw′ )T = 0 and H (xv′ + xw′ )T = 0 because xv′ + xw′ ∈ C, we have H0 (xv′ + xw′ )T = 0. As
in (11.22),
H0 (v′ + w′ , 0)T = H0 ((v′ + w′ , 0) − (xv′ + xw′ ))T .
(11.23)
458
Covering radius and cosets
Because wt(v′ + w′ ) = 2ρ + 2, the left-hand side of (11.23) is a combination of 2ρ + 2
columns of H0 from among the first ρ0 columns of H0 , while the right-hand side of (11.23)
is a combination of at most 2ρ columns of H0 as wt((v′ + w′ , 0) − (xv′ + xw′ )) ≤ 2ρ. As
j
above, in the right-hand side of (11.21) we may replace 2ρ + 2 of the columns h 0 by at
T
most 2ρ columns of H0 indicating that s0 is a linear combination of at most ρ0 − 2 columns
of H0 , which is a contradiction.
Therefore f is a coloring of K (ρ0 , ρ + 1) with at most 2i − 1 colors. Since ρ0 ≥ 2ρ + 2,
(11.19) implies that
ρ0 − 2ρ = χ (K (ρ0 , ρ + 1)) ≤ 2i − 1.
Notice that the Calderbank Bound with i = 1 is the bound of Theorem 11.6.3.
Exercise 659 Let C be a binary code with ρ(C) = 2 and C i a subcode of codimension i.
For which values of i is the upper bound on ρ(C i ) better in Theorem 11.6.4 than in the
Calderbank Bound?
Exercise 660 Let C be a binary code with ρ(C) = 10 and C i a subcode of codimension
i. For which values of i is the upper bound on ρ(C i ) better in Theorem 11.6.4 than in the
Calderbank Bound?
There is another upper bound on ρ(C 0 ) due to Hou [139] that in some cases improves
the bound of Theorem 11.6.4 and the Calderbank Bound. Its proof relies on knowledge of
the graph G(n, s). The graph G(n, s) has vertex set Fn2 ; two vertices v and w form an edge
whenever d(v, w) > s.
The following theorem of Kleitman [172] gives a lower bound on the chromatic number
of G(n, 2r ).
Theorem 11.6.6 Let n and r be positive integers with n ≥ 2r + 1. Then
2n
.
χ (G(n, 2r )) ≥
n
n
n
+
+ ··· +
0
1
r
We now state the bound on ρ(C 0 ) due to Hou; we leave the details of the proof as an
exercise.
Theorem 11.6.7 (Hou Bound) Let C 0 be a subcode of codimension i ≥ 1 of a binary linear
code C. Let ρ = ρ(C) and ρ0 = ρ(C 0 ). Then
2ρ0
≤ 2i .
ρ0
ρ0
ρ0
+
+ ··· +
0
1
ρ
(11.24)
Proof: If ρ0 ≤ 2ρ, then (11.24) is true by Exercise 661. Assume that ρ0 ≥ 2ρ + 1. Let H0
be the parity check matrix of C 0 as in the proof of Theorem 11.6.5. Analogously, define the
same map f from the vertices of G(ρ0 , 2ρ) (instead of K (ρ0 , ρ + 1)) to Fi2 . As before it
follows that f is a coloring of G(ρ0 , 2ρ) with 2i colors, and the result is a consequence of
Theorem 11.6.6.
459
11.7 Ancestors, descendants, and orphans
Exercise 661 Show that if ρ0 ≤ 2ρ, then (11.24) is true.
Exercise 662 Fill in the details of the proof of Theorem 11.6.7.
Exercise 663 Let C be a binary code with ρ(C) = 1 and C i a subcode of codimension i.
For each i ≥ 1, which upper bound on ρ(C i ) is better: the bound in Theorem 11.6.4, the
Calderbank Bound, or the Hou Bound?
11.7
Ancestors, descendants, and orphans
In this concluding section of the chapter we examine relationships among coset leaders of a
code C. There is a natural relationship among cosets, and we will discover that all cosets of C
of maximum weight (that is, those of weight ρ(C)) have a special place in this relationship.
We will also introduce the concept of the Newton radius of a code and investigate it in light
of this relationship among cosets.
There is a natural partial ordering ≤ on the vectors in Fn2 as follows. For x and y in Fn2 ,
define x ≤ y provided that supp(x) ⊆ supp(y). If x ≤ y, we will also say that y covers x.
We now use this partial order on Fn2 to define a partial order, also denoted ≤, on the set of
cosets of a binary linear code C of length n. If C 1 and C 2 are two cosets of C, then C 1 ≤ C 2
provided there are coset leaders x1 of C 1 and x2 of C 2 such that x1 ≤ x2 . As usual, C 1 < C 2
means that C 1 ≤ C 2 but C 1 = C 2 . Under this partial ordering the set of cosets of C has a
unique minimal element, the code C itself.
Example 11.7.1 Let C be the [5, 2, 3] binary code with generator matrix
1
0
1
0
1
1
1
1
0
.
1
Then the cosets of C with coset leaders are:
C0
C1
C2
C3
=
=
=
=
00000 + C,
10000 + C,
01000 + C,
00100 + C,
C4
C5
C6
C7
=
=
=
=
00010 + C,
00001 + C,
10100 + C = 01010 + C,
10010 + C = 01100 + C.
The partial ordering of the cosets is depicted as
C6
C7
P
✏
❍
✟
P
✏
❍
✟
❅
P
✏
❅ ❍ P
✏✟✟
P✏
P
✏
❅ ❍✏
❍
✟
✏ ❍ PPP
P
✏✏❅ ✟✟❍
C1
C2
C3
C4
C5
❍❍
✟
✟
❅
❍ ❅
❍❍ ❅
✟✟
✟
❍❍
❅ ✟✟
C0
460
Covering radius and cosets
In this diagram notice that the coset weights go up in steps of one as you go up a chain
starting with weight 0 at the very bottom.
Exercise 664 List the cosets of the code C with generator matrix given below. Also give
a pictorial representation of the partial ordering on these cosets as in Example 11.7.1.
1 1 1 1 0 0
(a)
,
0 0 1 1 1 1
1 0 0 0 1 1
(b) 0 1 0 1 0 1 .
0 0 1 1 1 0
If C 1 and C 2 are cosets of C with C 1 < C 2 , then C 1 is called a descendant of C 2 , and
C 2 is an ancestor of C 1 . If C 1 is a descendant of C 2 , then clearly wt(C 1 ) ≤ wt(C 2 ) − 1; if
C 1 < C 2 with wt(C 1 ) = wt(C 2 ) − 1, then C 1 is a child of C 2 , and C 2 is a parent of C 1 .1 By
Theorem 11.1.6(v), if C 1 < C 2 with wt(C 1 ) < wt(C 2 ) − 1, then there is a coset C 3 such that
C 1 < C 3 < C 2 . A coset of C is called an orphan provided it has no parents, that is, provided
it is a maximal element in the partial ordering of all cosets of C. Each coset of C of weight
ρ(C) is clearly an orphan but the converse does not hold in general. The presence of orphans
with weight less than ρ(C) contributes towards the difficulty in computing the covering
radius of a code.
Example 11.7.2 In Example 11.7.1, ρ(C) = 2. The code C is a descendant of all other
cosets and the child of C i for 1 ≤ i ≤ 5. C 6 and C 7 are parents of C i for 1 ≤ i ≤ 4. C 5 , C 6 ,
and C 7 are orphans; however, the weight of C 5 is not ρ(C).
Exercise 665 Give examples of parents and children in the partially ordered sets for the
codes in Exercise 664. Which cosets have the most parents? Which cosets have the most
children? Which cosets are orphans? Which orphans have weight equal to ρ(C)?
We now present two results dealing with the basic descendant/ancestor relationship
among cosets. The first deals with weights of descendants illustrating a type of “balance”
in the partial ordering.
Theorem 11.7.3 Let C be a binary linear code and let C ′ be a coset of weight w of C. Let
a be an integer with 0 < a < w. Then the number of descendants of C ′ of weight a equals
the number of descendants of C ′ of weight w − a.
Proof: By Theorem 11.1.6(v), the coset leaders of the descendants of weight a (respectively,
w − a) of C ′ are the vectors of weight a (respectively, of weight w − a) covered by some
vector of C ′ of weight w, as the latter are precisely the coset leaders of C ′ . Let x be a coset
leader of weight a of a descendant of C ′ . Then there exists a coset leader x′ of C ′ such
that x ≤ x′ ; as x′ covers x, x′ − x is a coset leader of weight w − a of a descendant of C ′ .
Furthermore, every coset leader of weight w − a of a descendant of C ′ arises in this manner,
as seen by reversing the roles of a and w − a in the above. Therefore there is a one-to-one
1
We have used the terms “parent” and “child” in relation to self-dual codes in Chapter 9 and earlier in this chapter.
The current use of “parent” and “child” is unrelated.
461
11.7 Ancestors, descendants, and orphans
correspondence between the coset leaders of weight a of descendants of C ′ and those of
weight w − a.
Suppose x and y are coset leaders of weight a of the same descendant of C ′ . Let x′ and
′
y be vectors of weight w in C ′ that cover x and y, respectively. Then y − x ∈ C, and since
x′ − y′ is also in C, so is
(x′ − x) − (y′ − y) = (x′ − y′ ) + (y − x).
Therefore x′ − x and y′ − y are coset leaders of the same coset. Thus there are at least as
many descendants of C ′ of weight a as there are of weight w − a. Again by interchanging
the roles of a and w − a, there are at least as many descendants of C ′ of weight w − a as
there are of weight a.
Example 11.7.4 We showed in Example 8.3.2 that the covering radius of the [24, 12, 8]
extended binary Golay code is 4. Furthermore, every coset of weight 4 contains exactly
six coset leaders; in pairs their supports are disjoint. Each of these coset leaders covers
exactly four coset leaders of weight 1 cosets. Because the cosets of weight 1 have a unique
coset leader, there are 24 cosets of weight 1 that are descendants of a weight 4 coset. By
Theorem 11.7.3 there are 24 cosets of weight 3 that are children of a given coset of weight 4.
Exercise 666 Describe the form of the coset leaders of the 24 cosets of weight 3 that are
descendants of a fixed weight 4 coset in the [24, 12, 8] extended binary Golay code as noted
in Example 11.7.4.
If C 1 and C 2 are two cosets of C with C 1 < C 2 , then the following theorem shows how to
obtain all of the coset leaders of C 1 from one coset leader of C 1 and all of the coset leaders
of C 2 [36].
Theorem 11.7.5 Let C 1 and C 2 be two cosets of a binary linear code C and assume that
C 1 < C 2 . Let x1 and x2 be coset leaders of C 1 and C 2 , respectively, satisfying x1 < x2 . Let
u = x2 − x1 . Then a vector x is a coset leader of C 1 if and only if there is a coset leader y
of C 2 such that u ≤ y and x = y − u.
Proof: First suppose that y is a coset leader of C 2 with u ≤ y. Then
(y − u) − x1 = y − (u + x1 ) = y − x2 ∈ C,
and thus y − u is in C 1 . Since
wt(y − u) = wt(y) − wt(u) = wt(x2 ) − wt(u) = wt(x1 ),
y − u is a coset leader of C 1 .
Conversely, suppose that x is a coset leader of C 1 . Then
(x + u) − x2 = (x + x2 − x1 ) − x2 = x − x1 ∈ C,
and thus x + u is in C 2 . As
wt(x + u) ≤ wt(x) + wt(u) = wt(x1 ) + wt(u) = wt(x2 ),
x + u is a coset leader of C 2 .
462
Covering radius and cosets
Exercise 667 If u ≤ y and x = y − u, show that x ≤ x + u and u ≤ x + u.
Example 11.7.6 Let C be the [24, 12, 8] extended binary Golay code. Let C 2 = x2 + C be
a coset of weight 4. Choose any vector x1 of weight 3 with x1 ≤ x2 . Let C 1 = x1 + C. Then,
in the notation of Theorem 11.7.5, u is a weight 1 vector with supp(u) ⊂ supp(x2 ). As all
other coset leaders of C 2 have supports disjoint from supp(x2 ), x1 is the only coset leader of
C 1 by Theorem 11.7.5, a fact we already knew as all cosets of weight t = ⌊(d − 1)/2⌋ = 3
or less have unique coset leaders.
The following corollaries are immediate consequences of Theorem 11.7.5.
Corollary 11.7.7 Let C be a binary code and C ′ a coset of C with coset leader x. Let
i ∈ supp(x) and let u be the vector with 1 in coordinate i and 0 elsewhere. Then x − u is a
coset leader of a child of C ′ , and no coset leader of (x − u) + C has i in its support.
Proof: By Theorem 11.1.6(v), x − u is a coset leader of a child of C ′ . By Theorem 11.7.5,
every coset leader of this child is of the form y − u, where u ≤ y; in particular, no coset
leader of the child has i in its support.
Corollary 11.7.8 [(129)] Let C 1 and C 2 be two cosets of a binary linear code C and assume
that C 1 < C 2 . Then for each coset leader x of C 1 there is a coset leader y of C 2 such that
x < y.
Exercise 668 Prove Corollary 11.7.8.
Example 11.7.9 It follows from Corollary 11.7.8 that to determine all the ancestors of
a coset C ′ it suffices to know only one leader of C ′ . From that one coset leader x every
coset leader of an ancestor has support containing supp(x). However, to determine all the
descendants of C ′ , it does not suffice in general to know only one leader x of C ′ in that the coset
leader of a descendent of C ′ may not have support contained in supp(x). In Example 11.7.1,
a coset leader of C 1 has support contained in the support of one of the two coset leaders of
C 1 ’s ancestor C 6 . However, the coset leader 01010 of C 6 does not have support containing
the support of the unique coset leader of C 6 ’s descendant C 1 .
The following theorem contains a characterization of orphans [33, 36].
Theorem 11.7.10 Let C be an [n, k, d] binary code. Let C ′ be a coset of C of weight w. The
following hold:
(i) C ′ is an orphan if and only if each coordinate position is covered by a vector in C ′ of
weight w or w + 1.
(ii) If d > 2, then there is a one-to-one correspondence between the children of C ′ and the
coordinate positions covered by the coset leaders of C ′ .
(iii) If d > 2 and C is an even code, then there is a one-to-one correspondence between
the parents of C ′ and the coordinate positions not covered by the coset leaders of C ′ .
In particular, C ′ is an orphan if and only if each coordinate position is covered by a
coset leader of C ′ .
Proof: For 1 ≤ i ≤ n, let ei ∈ Fn2 denote the vector of weight 1 with a 1 in coordinate i.
By Corollary 11.7.8, each parent of C ′ equals C ′ + ei for some i. In order for C ′ to be an
463
11.7 Ancestors, descendants, and orphans
Table 11.7 Coset weight distribution of R(1, 4)
Number of vectors of given weight
Coset
weight
0
1
2
3
4
4
5
6
0 1 2 3 4 5
6
7
1
8
9
Number
10 11 12 13 14 15 16 of cosets
30
1
15
1
7
1
3
2
4
16
12
8
6
1
15
7
12
12
24
10
1
1
3
8
10
16
1
2
4
6
16
1
16
120
560
840
35
448
28
orphan, C ′ + ei must contain vectors of weight w or less for all i. This is only possible if,
for each i, there is a vector in C ′ of weight either w or w + 1 with a 1 in coordinate i. This
proves (i).
Now assume that d > 2. Then for i = j, the cosets C ′ + ei and C ′ + e j are distinct. By
definition, every child of C ′ equals C ′ + ei for some i. Such a coset is a child if and only if
some coset leader of C ′ has a 1 in coordinate i. This proves (ii).
Suppose now that C is even and d > 2. As in the proof of (i) each parent of C ′ equals
′
C + ei for some i. In order for C ′ + ei to be a parent of C ′ , every vector in C ′ of weight
w or w + 1 must not have a 1 in coordinate i. As C is even, a coset has only even weight
vectors or only odd weight vectors; thus C ′ has no vectors of weight w + 1. The first part
of (iii) now follows, noting that C ′ + ei = C ′ + e j if i = j. The second part of (iii) follows
directly from the first.
We leave the proof of the following corollary as an exercise.
Corollary 11.7.11 Let C be an [n, k, d] even binary code with d > 2. The following hold:
(i) An orphan of C has n descendants of weight 1 and n children.
(ii) A coset of weight w, which has exactly t coset leaders where tw < n, is not an orphan.
Exercise 669 Prove Corollary 11.7.11.
Example 11.7.12 Table 11.7 presents the coset weight distribution of the [16, 5, 8] Reed–
Muller code R(1, 4). In a series of exercises, you will be asked to verify the entries in the
table. By Theorem 11.2.5, ρ(R(1, 4)) = 6. There are orphans of R(1, 4) of weights 4 and
6; see Exercise 676. The presence of these orphans and their weight distribution is part of
a more general result found in [36].
Exercise 670 By Theorem 1.10.1, R(1, 4)⊥ is the [16, 11, 4] Reed–Muller code R(2, 4).
(a) By Theorem 1.10.1, R(1, 4) is a [16, 5, 8] code containing the all-one vector. Show that
the weight distribution of R(1, 4) is A0 = A16 = 1 and A8 = 30, verifying the entries
in first row of Table 11.7.
464
Covering radius and cosets
(b) Show that the only possible weights of nonzero codewords in R(1, 4)⊥ are 4, 6, 8, 10,
12, and 16.
(c) Using the Assmus–Mattson Theorem, show that the 30 weight 8 codewords of R(1, 4)
hold a 3-(16, 8, 3) design and draw its Pascal triangle.
(d) Verify the weight distribution of the cosets of weights 1, 2, and 3 in Table 11.7. Also
verify that the number of such cosets is as indicated in the table. Hint: See Example
8.4.6.
Exercise 671 By Exercise 670 we have verified that the entries in Table 11.7 for the
cosets of R(1, 4) of weights 1 and 3 are correct. In this exercise, you will verify that the
information about weight 5 cosets is correct. By Exercise 670(b) and Theorem 7.5.2(iv),
the weight distribution of a coset of weight 5 is uniquely determined.
(a) If there are x weight 5 vectors in a coset of weight 5, show that there are 16 − x vectors
of weight 7 in the coset.
(b) How many vectors of weights 5 and 7 are there in F16
2 ?
(c) Let n be the number of cosets of weight 5. Since ρ(R(1, 4)) = 6, each weight 5 and
weight 7 vector in F16
2 is in a coset of weight 1, 3, or 5. Counting the total number of
weight 5 and weight 7 vectors, find two equations relating x and n. Solve the equations
to verify the information about weight 5 cosets in Table 11.7.
Exercise 672 Use Exercise 631 to verify that the weight distribution of a weight 6 coset
of R(1, 4) is as in Table 11.7.
Exercise 673 In this exercise we show that any coset of weight 4 in R(1, 4) has either two
or four coset leaders.
(a) Show that two different coset leaders in the same weight 4 coset of R(1, 4) have disjoint
supports.
(b) Show that the maximum number of coset leaders in a weight 4 coset of R(1, 4) is 4.
(c) Show that if a weight 4 coset of R(1, 4) has at least three coset leaders, it actually has
four coset leaders.
(d) Let v + R(1, 4) be a coset of R(1, 4) of weight 4 whose only coset leader is v. Suppose
that supp(v) = {i 1 , i 2 , i 3 , i 4 }.
(i) Show that supp(v) is not contained in the support of any weight 8 codeword of
R(1, 4).
(ii) Using Exercise 670(c), show that there are three weight 8 codewords of R(1, 4)
with supports containing {i 2 , i 3 , i 4 }.
(iii) By (i), the supports of the three codewords of (ii) cannot contain i 1 . Show that this
is impossible.
Exercise 674 By Exercise 670(b) and Theorem 7.5.2(iii), once we know the number of
coset leaders of a weight 4 coset of R(1, 4), the distribution of that coset is unique. By
Exercise 673, there are either two or four coset leaders in a weight 4 coset. Verify the
weight distributions of the weight 4 cosets given in Table 11.7. Hint: See Example 7.5.4. If
D = C ∪ (v + C), where v is a coset leader of weight 4, then D⊥ ⊆ C ⊥ = R(2, 4), which
has minimum weight 4.
465
11.7 Ancestors, descendants, and orphans
Exercise 675 In this exercise, you will verify that there are 35 weight 4 cosets of R(1, 4)
that have four coset leaders. From that you can compute the number of cosets of weight 4
with two coset leaders and the number of cosets of weight 6.
(a) Let x0 + R(1, 4) be a coset of weight 4 with coset leaders x0 , x1 , x2 , and x3 . Show
that c1 = x0 + x1 and c2 = x0 + x2 are weight 8 codewords in R(1, 4) whose supports
intersect in the set supp(x0 ) and in a natural way determine the supports of x1 , x2 ,
and x3 .
(b) Conversely, show how to obtain four coset leaders of a weight 4 coset by considering
the intersection of the supports of two weight 8 codewords of R(1, 4) whose supports
are not disjoint.
(c) There are 15 pairs of codewords of weight 8 in R(1, 4) where the weight 8 codewords
in each pair have disjoint supports. Choose two distinct pairs. Show how these pairs
determine the four coset leaders of a weight 4 coset of R(1, 4) and also determine a
third pair of weight 8 codewords.
(d) Show that there are 35 weight 4 cosets of R(1, 4) with four coset leaders.
(e) Show that there are 840 weight 4 cosets of R(1, 4) with two coset leaders.
(f) Show that there are 28 weight 6 cosets of R(1, 4).
Exercise 676 Do the following:
(a) Show that every weight 6 coset of R(1, 4) has exactly 16 children and that every weight
5 coset has exactly one parent.
(b) Show that every weight 4 coset of R(1, 4) with four coset leaders is an orphan, but
those with two coset leaders are not orphans.
(c) Show that every weight 4 coset of R(1, 4) with four coset leaders has exactly 16 children
and that every weight 3 coset has precisely one parent that is an orphan and 12 parents
that are not orphans.
We conclude this chapter with a brief introduction to the Newton radius first defined in
[123]. The Newton radius ν(C) of a binary code C is the largest value ν so that there exists
a coset of weight ν with only one coset leader. In particular, ν(C) is the largest weight of
any error that can be uniquely corrected.
The first statement in the following lemma is from [123].
Lemma 11.7.13 Let C be an [n, k, d] binary code. The following hold:
(i) ⌊(d − 1)/2⌋ ≤ ν(C) ≤ ρ(C).
(ii) If C is perfect, (d − 1)/2 = ν(C) = ρ(C).
(iii) If C is even and d > 2, then ν(C) < ρ(C).
Proof: Proofs of (i) and (ii) are left as an exercise. Theorem 11.7.10(iii) implies that orphans
in even codes have more than one coset leader. In particular, cosets of weight ρ(C) cannot
have unique coset leaders.
Exercise 677 Prove parts (i) and (ii) of Lemma 11.7.13.
Example 11.7.14 Let C be the [15, 4, 8] simplex code. By Theorem 11.2.7, ρ(C) = 7.
It can be shown [123] that ν(C) = 4. So in this example, we have strict inequality in
Lemma 11.7.13(i).
466
Covering radius and cosets
We can use the ordering of binary vectors already developed in this section to simplify
the proof of the following result from [123].
Theorem 11.7.15 If C is an [n, k] binary code, then
0 ≤ ρ(C) − ν(C) ≤ k.
Proof: That 0 ≤ ρ(C) − ν(C) follows from Lemma 11.7.13. Let ei be the binary vector of
length n and weight 1 with 1 in coordinate i. Let x1 be a weight ρ = ρ(C) coset leader of
a coset C 1 of C. By rearranging coordinates we may assume that supp(x1 ) = {1, 2, . . . , ρ}.
Then either ν(C) = ρ, in which case we are done, or ν(C) ≤ ρ − 1 by Lemma 11.7.13.
Suppose the latter occurs. Then there exists another coset leader x2 of C 1 . Hence x1 + x2 =
c1 ∈ C. By rearranging coordinates we may assume that 1 ∈ supp(c1 ), since x1 and x2 must
disagree on some coordinate. By Corollary 11.7.7, there is a child C 2 of C 1 with all coset
leaders having first coordinate 0, one of which is x3 = e1 + x1 . If ν(C) ≤ ρ − 2, there
is a coset leader x4 of C 2 that must disagree with x3 on some coordinate in supp(x3 ) =
{2, 3, . . . , ρ}; by rearranging, we may assume that c2 = x3 + x4 ∈ C is 1 on the second
coordinate. Since all coset leaders of C 2 are 0 on the first coordinate, so is c2 . Since c1 begins
1 · · · and c2 begins 01 · · · , they are independent. By Corollary 11.7.7, there is a child C 3 of
C 2 with all coset leaders having second coordinate 0, one of which is x5 = e2 + x3 . If C 3
has a coset leader that is 1 on the first coordinate, then C 2 must have a coset leader with a 1
in the coordinate by Corollary 11.7.8, which is a contradiction. So all coset leaders of C 3 are
0 on the first and second coordinates. If ν(C) ≤ ρ − 3, there is a coset leader x6 of C 3 which
must disagree with x5 on some coordinate in supp(x5 ) = {3, 4, . . . , ρ}; by rearranging, we
may assume that c3 = x5 + x6 ∈ C is 1 on the third coordinate. Since c1 begins 1 · · · , c2
begins 01 · · · , and c3 begins 001 · · · , they are independent. We can continue inductively to
find s independent codewords as long as ν(C) ≤ ρ − s. Since the largest value of s possible
is k, ν(C) ≥ ρ − k.
Exercise 678 In the proof of Theorem 11.7.15, carry out one more step in the induction.
12
Codes over Z4
The study of codes over the ring Z4 attracted great interest through the work of Calderbank,
Hammons, Kumar, Sloane, and Solé in the early 1990s which resulted in the publication
of a paper [116] showing how several well-known families of nonlinear binary codes were
intimately related to linear codes over Z4 . In particular, for m ≥ 4 an even integer, there is
a family K(m) of (2m , 22m , 2m−1 − 2(m−2)/2 ) nonlinear binary codes, originally discovered
m
by Kerdock [166], and a second family P(m) of (2m , 22 −2m , 6) nonlinear binary codes,
due to Preparata [288], which possess a number of remarkable properties. First, the codes
K(4) and P(4) are identical to one another and to the Nordstrom–Robinson code described
in Section 2.3.4. Second, the Preparata codes have the same length as the extended double
error-correcting BCH codes but have twice as many codewords. Third, and perhaps most
intriguing, K(m) and P(m) behave as if they are duals of each other in the sense that the
weight enumerator of P(m) is the MacWilliams transform of the weight enumerator of K(m);
1
WC (y − x, y + x)
in other words, if C = K(m), then the weight enumerator of P(m) is |C|
(see (M3 ) of Section 7.2). A number of researchers had attempted to explain this curious
relationship between weight enumerators without success until the connection with codes
over Z4 was made in [116].
In this chapter we will present the basic theory of linear codes over Z4 including the
connection between these codes and binary codes via the Gray map. We will also study
cyclic, self-dual, and quadratic residue codes over Z4 . The codes of Kerdock and Preparata
will be described as the Gray image of certain extended cyclic codes over Z4 . In order to
present these codes we will need to examine the Galois ring GR(4m ), which plays the same
role in the study of cyclic codes over Z4 as F2m (also denoted GF(2m )) does in the study of
cyclic codes over F2 .
12.1
Basic theory of Z4 -linear codes
A Z4 -linear code 1 C of length n is an additive subgroup of Zn4 . Such a subgroup is a Z4 module, which may or may not be free. (A Z4 -module M is free if there exists a subset B
of M, called a basis, such that every element in M is uniquely expressible as a Z4 -linear
combination of the elements in B.) We will still term elements of Zn4 “vectors” even though
1
Codes over Z4 are sometimes called quaternary codes in the literature. Unfortunately, codes over F4 are also
called quaternary, a term we used in Chapter 1. In this book we reserve the term “quaternary” to refer to codes
over F4 .
468
Codes over Z4
Zn4 is not a vector space. Note that if a vector v has components that are all 0s or 2s, then
2v = 0 implying that such a vector cannot be in a basis of a free submodule of Zn4 since
2 = 0. In describing codes over Z4 , we denote the elements of Z4 in either of the natural
forms {0, 1, 2, 3} or {0, 1, 2, −1}, whichever is most convenient.
Example 12.1.1 Let C be the Z4 -linear code of length 4 with the 16 codewords:
0000, 1113, 2222, 3331, 0202, 1311, 2020, 3133,
0022, 1131, 2200, 3313, 0220, 1333, 2002, 3111.
This is indeed an additive subgroup of Z44 as Exercise 679 shows. If this were a free Z4 module, it would have a basis of two vectors b1 and b2 such that every codeword would be
a unique Z4 -linear combination of b1 and b2 . As described above, both b1 and b2 have at
least one component equal to 1 or 3. However, if b1 and b2 are among the eight codewords
with one component 1 or 3, then 2b1 = 2222 = 2b2 . Hence {b1 , b2 } cannot be a basis of
C implying C is not free. However, we can still produce a perfectly good generator matrix
whose rows function for all intents and purposes as a “basis” for the code.
Exercise 679 Let C be the code over Z4 of Example 12.1.1.
(a) Show that every codeword of C can be written uniquely in the form xc1 + yc2 + zc3 ,
where c1 = 1113, c2 = 0202, c3 = 0022, x ∈ Z4 , and y and z are in {0, 1}.
(b) Use part (a) to show that C is a Z4 -linear code.
Example 12.1.2
consider
1 1 1
G = 0 2 0
0 0 2
Let C be the code over Z4 of Example 12.1.1. By Exercise 679(a), we can
3
2
2
as a generator matrix for C in the following sense: every codeword of C is (x yz)G for some
x in Z4 , and y and z in Z2 .
As Exercise 679 illustrates, every Z4 -linear code C contains a set of k1 + k2 codewords
c1 , . . . , ck1 , ck1 +1 , . . . , ck1 +k2 such that every codeword in C is uniquely expressible in the
form
k1
i=1
ai ci +
k1 +k2
ai ci ,
i=k1 +1
where ai ∈ Z4 for 1 ≤ i ≤ k1 and ai ∈ Z2 for k1 + 1 ≤ i ≤ k1 + k2 . Furthermore, each ci
has at least one component equal to 1 or 3 for 1 ≤ i ≤ k1 and each ci has all components
equal to 0 and 2 for k1 + 1 ≤ i ≤ k1 + k2 . If k2 = 0, the code C is a free Z4 -module. The
matrix whose rows are ci for 1 ≤ i ≤ k1 + k2 is called a generator matrix for C. The code
C has 4k1 2k2 codewords and is said to be of type 4k1 2k2 . The code of Example 12.1.1 has
type 41 22 .
As with codes over fields, we can describe equivalence of Z4 -linear codes. Let C 1 and C 2
be two Z4 -linear codes of length n. Then C 1 and C 2 are permutation equivalent provided
there is an n × n permutation matrix P such that C 2 = C 1 P. We can also scale components
469
12.1 Basic theory of Z4 -linear codes
of a code to obtain an equivalent code provided the scalars are invertible elements of Z4 .
To that end, we say that C 1 and C 2 are monomially equivalent provided there is an n × n
monomial matrix M with all nonzero entries equal to 1 or 3 such that C 2 = C 1 M. The
permutation automorphism group PAut(C) of C is the set of all permutation matrices P
such that C P = C, while the monomial automorphism group MAut(C) of C is the set of all
monomial matrices M (with nonzero entries 1 and 3) such that C M = C.
Exercise 680
1 1
G1 = 0 2
0 0
Show that the Z4 -linear codes with generator matrices
1 1 1 1
1 3
0 2 and G 2 = 2 0 0 2
0 2 0 2
2 2
are monomially equivalent.
Exercise 681 Find PAut(C) and MAut(C), where C is the code in Example 12.1.1.
Every [n, k] linear code over Fq is permutation equivalent to a code with generator matrix
in standard form [Ik A] by Theorem 1.6.2. In a similar way, there is a standard form for
the generator matrix of a Z4 -linear code. A generator matrix G of a Z4 -linear code C is in
standard form if
G=
Ik1
O
A
2Ik2
B1 + 2B2
,
2C
(12.1)
where A, B1 , B2 , and C are matrices with entries from Z2 , and O is the k2 × k1 zero
matrix. The code C is of type 4k1 2k2 . Notice that the generator matrix for the code of
Example 12.1.1 is in standard form. Although rather tedious, the proof of the following
result is straightforward.
Theorem 12.1.3 Every Z4 -linear code is permutation equivalent to a code with generator
matrix in standard form.
Exercise 682 Prove Theorem 12.1.3.
There is a natural inner product, the ordinary dot product modulo 4, on Zn4 defined by
x · y = x1 y1 + x2 y2 + · · · + xn yn (mod 4),
where x = x1 · · · xn and y = y1 · · · yn . As with linear codes over finite fields, we can define
the dual code C ⊥ of a Z4 -linear code C of length n by C ⊥ = {x ∈ Zn4 | x · c = 0 for all c ∈
C}. By Exercise 683, C ⊥ is indeed a Z4 -linear code. A Z4 -linear code is self-orthogonal
when C ⊆ C ⊥ and self-dual if C = C ⊥ .
Exercise 683 Prove that the dual code of a Z4 -linear code is a Z4 -linear code.
Exercise 684 Prove that the code of Example 12.1.1 is self-dual.
Example 12.1.4 Unlike codes over finite fields, there are self-dual Z4 -linear codes of odd
length. For example, the code of length n and type 2n with generator matrix 2In is self-dual.
470
Codes over Z4
Exercise 685 Find generator matrices in standard form for all self-dual codes of lengths
n ≤ 4, up to monomial equivalence. Give the type of each code.
If C has generator matrix in standard form (12.1), then C ⊥ has generator matrix
G⊥ =
−(B1 + 2B2 )T − C T AT
2AT
CT
2Ik2
In−k1 −k2
,
O
(12.2)
where O is the k2 × (n − k1 − k2 ) zero matrix. In particular, C ⊥ has type
4n−k1 −k2 2k2 .
(12.3)
Exercise 686 Give the generator matrix (12.2) of C ⊥ for the code C of Example 12.1.1.
Exercise 687 Prove that G ⊥ G T is the zero matrix, where G is given in (12.1) and G ⊥ is
given in (12.2).
Unlike codes over F2 , the concept of weight of a vector in Zn4 can have more than one
meaning. In fact, three different weights, and hence three different distances, are used
when dealing with codes over Z4 . Let x ∈ Zn4 ; suppose that n a (x) denotes the number
of components of x equal to a for all a ∈ Z4 . The Hamming weight of x is wt H (x) =
n 1 (x) + n 2 (x) + n 3 (x), the Lee weight of x is wt L (x) = n 1 (x) + 2n 2 (x) + n 3 (x), and the
Euclidean weight of x is wt E (x) = n 1 (x) + 4n 2 (x) + n 3 (x). Thus components equaling 1 or
3 contribute 1 to each weight while 2 contributes 2 to the Lee and 4 to the Euclidean weight.
The Hamming, Lee, and Euclidean distances between x and y are d H (x, y) = wt H (x − y),
d L (x, y) = wt L (x − y), and d E (x, y) = wt E (x − y), respectively.
Exercise 688 Do the following:
(a) What are the Hamming, Lee, and Euclidean weights of the vector 120 203 303?
(b) What are the Hamming, Lee, and Euclidean distances between the vectors 30 012 221
and 20 202 213?
Exercise 689 Find the Hamming, Lee, and Euclidean weight distributions of the code C
of Example 12.1.1.
Exercise 690 Let C be a Z4 -linear code that contains a codeword with only 1s and 3s.
Prove that the Lee weight of all codewords in C ⊥ is even.
The following theorem is the Z4 analogue to Theorem 1.4.5(iv).
Theorem 12.1.5 Let C be a self-orthogonal Z4 -linear code with c ∈ C. Then:
(i) wt L (c) ≡ 0 (mod 2), and
(ii) wt E (c) ≡ 0 (mod 4).
Exercise 691 Prove Theorem 12.1.5.
There are also analogues to the MacWilliams equations for Z4 -linear codes C of length n.
Let
HamC (x, y) =
x wt H (c) y n−wt H (c)
c∈C
471
12.1 Basic theory of Z4 -linear codes
be the Hamming weight enumerator of C. Let
LeeC (x, y) =
x wtL (c) y 2n−wtL (c)
(12.4)
c∈C
be the Lee weight enumerator of C; note the presence of 2n in the exponent of y, which
is needed because there may be codewords of Lee weight as high as 2n. The analogs of
(M3 ) from Section 7.2 are [59, 116, 173, 174]:
1
HamC (y − x, y + 3x), and
|C|
1
LeeC (y − x, y + x).
LeeC ⊥ (x, y) =
|C|
HamC ⊥ (x, y) =
(12.5)
(12.6)
Exercise 692 Give the Hamming and Lee weight enumerators for the self-dual code C of
Example 12.1.1. Then verify (12.5) and (12.6) for this code.
There are two other weight enumerators of interest with Z4 -linear codes. With n i (v)
defined as before, let
cweC (a, b, c, d) =
a n 0 (v) bn 1 (v) cn 2 (v) d n 3 (v)
v∈C
denote the complete weight enumerator of C, and let
sweC (a, b, c) =
a n 0 (v) bn 1 (v)+n 3 (v) cn 2 (v)
(12.7)
v∈C
denote the symmetrized weight enumerator of C. Notice that equivalent codes have the same
symmetrized weight enumerators (making this one the more useful) while equivalent codes
may not have the same complete weight enumerators. The corresponding MacWilliams
equations are:
cweC ⊥ (a, b, c, d) =
1
cweC (a + b + c + d, a + ib − c − id,
|C|
a − b + c − d, a − ib − c + id), and
sweC ⊥ (a, b, c) =
where i =
√
1
sweC (a + 2b + c, a − c, a − 2b + c),
|C|
(12.8)
(12.9)
−1.
Exercise 693 Give the complete and symmetrized weight enumerators for the self-dual
code C of Example 12.1.1. Then verify (12.8) and (12.9) for this code.
Exercise 694 Do the following:
(a) Show that sweC (a, b, c) = cweC (a, b, c, b).
(b) Verify (12.9) assuming that (12.8) is valid.
(c) Show that HamC (x, y) = sweC (y, x, x).
(d) Verify (12.5) assuming that (12.9) is valid.
472
Codes over Z4
12.2
Binary codes from Z4 -linear codes
The vehicle by which binary codes are obtained from Z4 -linear codes is the Gray map
G : Z4 → F22 defined by
G(0) = 00, G(1) = 01, G(2) = 11, and G(3) = 10.
This map is then extended componentwise to a map, also denoted G, from Zn4 to F2n
2 . If C
is a Z4 -linear code, its Gray image will be the binary code denoted G(C). Note that C and
G(C) have the same size. In general, however, G(C) will be nonlinear.
Exercise 695 Find all pairs of elements a and b of Z4 where G(a + b) = G(a) + G(b).
Exercise 696 Let C be the Z4 -linear code of Example 12.1.1.
(a) List the 16 codewords of G(C).
(b) Show that G(C) is linear.
(c) What is the binary code obtained?
Exercise 697 Let C be the Z4 -linear code of length 3 with generator matrix
G=
1
0
0
1
1
.
3
(a) List the 16 codewords in C.
(b) List the 16 codewords in G(C).
(c) Show that G(C) is nonlinear.
In doing Exercises 696 and 697, you will note that the Lee weight of v ∈ Zn4 is the
ordinary Hamming weight of its Gray image G(v). This leads to an important property of
the Gray images of Z4 -linear codes. A code C (over Z4 or Fq ) is distance invariant provided
the Hamming weight distribution of c + C is the same for all c ∈ C. Note that all linear
codes (over Z4 or Fq ) must be distance invariant simply because c + C = C for all c ∈ C.
Example 12.2.1 When dealing with nonlinear binary codes the distance distribution of the
code is more significant than the weight distribution because the former gives information
about the error-correcting capability of the code. Recall from Chapter 2 that the Hamming
distance distribution of a code C of length n is the set {Bi (C) | 0 ≤ i ≤ n}, where
Bi (C) =
1
|C|
c∈C
|{v ∈ C | d(v, c) = i}|
and d(v, c) is the Hamming distance between c and v; for codes over Z4 , use the same
definition with d H in place of d. Thus {Bi (C) | 0 ≤ i ≤ n} is the average of the weight
distributions of c + C for all c ∈ C. If the code is distance invariant, the distance distribution
is the same as the weight distribution of any set c + C; if in addition, the code contains the
zero codeword, the distance distribution is the same as the weight distribution of C. Thus
being distance invariant is a powerful property of a code. For instance, let C be the binary
473
12.2 Binary codes from Z4 -linear codes
code of length 6 with codewords c1 = 111100, c2 = 001111, and c3 = 101010. The weight
distribution of c j + C is
j\i
0
1
2
3
4
5
6
1
2
3
1
1
1
0
0
0
0
0
0
1
1
2
1
1
0
0
0
0
0
0.
0
Thus C is not distance invariant, and its distance distribution is B0 (C) = 1, B3 (C) = 4/3,
and B4 (C) = 2/3, with Bi (C) = 0 otherwise. The minimum weight 3 of this code happens
to equal the minimum distance of the code as well; this does not hold in general. See
Exercise 698.
Exercise 698 Let C be the binary code of length 7 with codewords c1 = 1111001, c2 =
0011111, and c3 = 1010101.
(a) Find the weight distribution of c j + C for j = 1, 2, and 3.
(b) Find the distance distribution of C.
(c) Find the minimum distance and the minimum weight of C.
In what follows we will still denote the Hamming weight of a binary vector v by wt(v).
Theorem 12.2.2 The following hold:
(i) The Gray map G is a distance preserving map from Zn4 with Lee distance to F2n
2 with
Hamming distance.
(ii) If C is a Z4 -linear code, then G(C) is distance invariant.
(iii) If C is a Z4 -linear code, then the Hamming weight distribution of G(C) is the same as
the Lee weight distribution of C.
Proof: By Exercise 699, if a and b are in Z4 , then wt L (a − b) = wt(G(a) − G(b)). It
follows that if v = v1 · · · vn and w = w1 · · · wn are in Zn4 , then
n
n
d L (v, w) =
i=1
wt L (vi − wi ) =
i=1
wt(G(vi ) − G(wi )) = d(G(v), G(w)),
verifying (i).
By Exercise 699, if a and b are in Z4 , then wt L (a − b) = wt(G(a) + G(b)), implying
that if v = v1 · · · vn and w = w1 · · · wn are in Zn4 , then
n
wt L (v − w) =
=
i=1
n
i=1
wt L (vi − wi )
wt(G(vi ) + G(wi )) = wt(G(v) + G(w)).
Therefore the Hamming weight distribution of G(c) + G(C) is the same as the Lee distribution of c − C = C. Hence (ii) and (iii) follow.
Exercise 699 Prove that if a and b are in Z4 , then wt L (a − b) = wt(G(a) + G(b)).
474
Codes over Z4
In Exercises 696 and 697 we see that the Gray image of a code may be linear or nonlinear.
It is natural to ask if we can tell when the Gray image will be linear. One criterion is the
following, where v ∗ w is the componentwise product of the two vectors v and w in Zn4 .
Theorem 12.2.3 Let C be a Z4 -linear code. The binary code G(C) is linear if and only if
whenever v and w are in C, so is 2(v ∗ w).
Proof: By Exercise 700, if a and b are in Z4 , then G(a) + G(b) = G(a + b + 2ab). Therefore if v and w are in Zn4 , then G(v) + G(w) = G(v + w + 2(v ∗ w)). In particular, this shows
that if v and w are in C, then G(v) + G(w) is in G(C) if and only if v + w + 2(v ∗ w) ∈ C
if and only if 2(v ∗ w) ∈ C, since v + w ∈ C. The result now follows.
Exercise 700 Prove that if a and b are in Z4 , then
G(a) + G(b) = G(a + b + 2ab).
See the related Exercise 695.
Exercise 701 Verify that the code C of Example 12.1.1 (and Exercise 696) satisfies the
criterion of Theorem 12.2.3, while the code C of Exercise 697 does not.
We will use the following theorem frequently; compare this with Theorems 1.4.3 and
1.4.8.
Theorem 12.2.4 The following hold:
(i) Let u and v be in Zn4 . Then
wt E (u + v) ≡ wt E (u) + wt E (v) + 2(u · v) (mod 8).
(ii) Let C be a self-orthogonal Z4 -linear code which has a generator matrix G such that
all rows r of G satisfy wt E (r) ≡ 0 (mod 8). Then wt E (c) ≡ 0 (mod 8) for all c ∈ C.
(iii) Let C be a Z4 -linear code such that wt E (c) ≡ 0 (mod 8) for all c ∈ C. Then C is selforthogonal.
Exercise 702 Prove Theorem 12.2.4.
We now apply this theorem to show how to construct the Nordstrom–Robinson code as
the Gray image of a Z4 -linear code.
Example 12.2.5
1 0 0
0 1 0
G=
0 0 1
0 0 0
Let o8 be the Z4 -linear code, called the octacode, with generator matrix
0 3 1 2 1
0 1 2 3 1
.
0 3 3 3 2
1
2
3
1
1
The rows of G are orthogonal to themselves and pairwise orthogonal to each other. Thus o8 is
self-orthogonal, and, as it has type 44 , o8 is self-dual. The Euclidean weight of each row of G
equals 8. Hence by Theorem 12.2.4, every codeword of o8 has Euclidean weight a multiple
of 8. The minimum Euclidean weight is therefore 8 as each row of G has that Euclidean
weight. By Exercise 703, o8 has minimum Lee weight 6. The Gray image G(o8 ) therefore is
475
12.3 Cyclic codes over Z4
a (16, 256, 6) binary code by Theorem 12.2.2. By [317], G(o8 ) is the Nordstrom–Robinson
code; see the discussion at the end of Section 2.3.4. We will examine this code more closely
when we study the Kerdock codes in Section 12.7. In particular, Exercise 755 will show
that the octacode is related to the Kerdock code K(4) and Exercise 759 will give its Lee
weight distribution.
Exercise 703 In this exercise, you will show that the octacode o8 has minimum Lee
weight 6. By Theorem 12.1.5, the codewords of o8 all have even Lee weight; each row of
the generator matrix has Lee weight 6.
(a) Show that there cannot be a vector in Zn4 of Lee weight 2 and Euclidean weight 8.
(b) Show that the only vectors in Zn4 of Lee weight 4 and Euclidean weight 8 have exactly
two components equal to 2 and the remaining components equal to 0.
(c) By considering the generator matrix G for o8 in Example 12.2.5, show that o8 has no
codewords of Lee weight 4.
Exercise 704 Using Theorem 12.2.3 show that G(o8 ) is nonlinear.
12.3
Cyclic codes over Z4
As with cyclic codes over a field, cyclic codes over Z4 form an important family of Z4 -linear
codes. A body of theory has been developed to handle these codes with obvious parallels
to the theory of cyclic codes over fields.
As with usual cyclic codes over Fq , we view codewords c = c0 c1 · · · cn−1 in a cyclic
Z4 -linear code of length n as polynomials c(x) = c0 + c1 x + · · · + cn−1 x n−1 ∈ Z4 [x]. If
we consider our polynomials as elements of the quotient ring
Rn = Z4 [x]/(x n − 1),
then xc(x) modulo x n − 1 represents the cyclic shift of c. When studying cyclic codes over
Fq , we found generator polynomials and generating idempotents of these codes. It is natural
to ask if such polynomials exist in the Z4 world. To do this we must study the factorization
of x n − 1 over Z4 .
12.3.1 Factoring x n − 1 over Z4
There are significant differences between the rings Z4 [x] and Fq [x]. For example, an important obvious difference between the rings Z4 [x] and Fq [x] is that the degree of a product
of two polynomials in Z4 [x] may be less than the sum of the degrees of the polynomials.
Among other things this means that it is possible for a nonconstant polynomial in Z4 [x] to
be invertible, that is, a unit. See Exercise 705.
Exercise 705 Compute (1 + 2s(x))2 for any s(x) ∈ Z4 [x]. Show how this illustrates the
fact that the degree of a product of two polynomials in Z4 [x] may be less than the sum
of the degrees of the polynomial factors. This computation shows that 1 + 2s(x) has a
multiplicative inverse in Z4 [x]; what is its inverse?
476
Codes over Z4
We still say that a polynomial f (x) ∈ Z4 [x] is irreducible if whenever f (x) = g(x)h(x)
for two polynomials g(x) and h(x) in Z4 [x], one of g(x) or h(x) is a unit. Since units do
not have to be constant polynomials and the degree of f (x) may be less than the sum of the
degrees of g(x) and h(x), it is more difficult in Z4 [x] to check whether or not a polynomial
is irreducible. Recall that in Fq [x] nonconstant polynomials can be factored into a product
of irreducible polynomials which are unique up to scalar multiplication; see Theorem 3.4.1.
This is not the case for polynomials in Z4 [x], even polynomials of the form we are most
interested in, as the next example illustrates.
Example 12.3.1 The following are two factorizations of x 4 − 1 into irreducible polynomials in Z4 [x]:
x 4 − 1 = (x − 1)(x + 1)(x 2 + 1) = (x + 1)2 (x 2 + 2x − 1).
The verification that these polynomials are in fact irreducible is tedious but straightforward.
(If, for example, you assume that x 2 + 2x − 1 = g(x)h(x), to show irreducibility then one
of g(x) or h(x) must be shown to be a unit. Since the degree of a product is not necessarily
the sum of the degrees, it cannot be assumed for instance that g(x) and h(x) are both of
degree 1.)
The proper context for discussing factorization in Z4 [x] is not factoring polynomials
into a product of irreducible polynomials but into a product of polynomials called basic
irreducible polynomials. In order to discuss this concept, we need some notation and terminology. Define µ : Z4 [x] → F2 [x] by µ( f (x)) = f (x) (mod 2); that is, µ is determined by
µ(0) = µ(2) = 0, µ(1) = µ(3) = 1, and µ(x) = x. By Exercise 706, µ is a surjective ring
homomorphism with kernel (2) = {2s(x) | s(x) ∈ Z4 [x]}. In particular, this implies that if
f (x) ∈ F2 [x], there is a g(x) ∈ Z4 [x] such that µ(g(x)) = f (x), and two such g(x)s differ by
an element 2s(x) for some s(x) ∈ Z4 [x]. The map µ is called the reduction homomorphism.
We use these facts in what follows without reference.
Exercise 706 Prove that µ is a surjective ring homomorphism with kernel (2) = {2s(x) |
s(x) ∈ Z4 [x]}.
A polynomial f (x) ∈ Z4 [x] is basic irreducible if µ( f (x)) is irreducible in F2 [x]; it is
monic if its leading coefficient is 1. An ideal I of a ring R is called a primary ideal provided
ab ∈ I implies that either a ∈ I or br ∈ I for some positive integer r . A polynomial f (x) ∈
Z4 [x] is primary if the principal ideal ( f (x)) = { f (x)g(x) | g(x) ∈ Z4 [x]} is a primary
ideal.
Exercise 707 Examine the two factorizations of x 4 − 1 in Example 12.3.1. Which factors
are basic irreducible polynomials and which are not?
Lemma 12.3.2 If f (x) ∈ Z4 [x] is a basic irreducible polynomial, then f (x) is a primary
polynomial.
Proof: Suppose that g(x)h(x) ∈ ( f (x)). As µ( f (x)) is irreducible, d = gcd(µ(g(x)),
µ( f (x))) is either 1 or µ( f (x)). If d = 1, then by the Euclidean Algorithm there exist polynomials a(x) and b(x) in Z4 [x] such that µ(a(x))µ(g(x)) + µ(b(x))µ( f (x)) = 1. Hence
477
12.3 Cyclic codes over Z4
a(x)g(x) + b(x) f (x) = 1 + 2s(x) for some s(x) ∈ Z4 [x]. Therefore a(x)g(x)h(x)(1 +
2s(x)) + b(x) f (x)h(x)(1 + 2s(x)) = h(x)(1 + 2s(x))2 = h(x), implying that h(x) ∈
( f (x)). Suppose now that d = µ( f (x)). Then there exists a(x) ∈ Z4 [x] such that µ(g(x)) =
µ( f (x))µ(a(x)), implying that g(x) = f (x)a(x) + 2s(x) for some s(x) ∈ Z4 [x]. Hence
g(x)2 = ( f (x)a(x))2 ∈ ( f (x)). Thus f (x) is a primary polynomial.
If R = Z4 [x] or F2 [x], two polynomials f (x) and g(x) in R are coprime or relatively
prime provided R = ( f (x)) + (g(x)).
Exercise 708 Let R = Z4 [x] or F2 [x] and suppose that f (x) and g(x) are polynomials in
R. Prove that f (x) and g(x) are coprime if and only if there exist a(x) and b(x) in R such
that a(x) f (x) + b(x)g(x) = 1.
Exercise 709 Let R = Z4 [x] or F2 [x]. Let k ≥ 2. Prove that if f i (x) are pairwise coprime
polynomials in R for 1 ≤ i ≤ k, then f 1 (x) and f 2 (x) f 3 (x) · · · f k (x) are coprime.
Lemma 12.3.3 Let f (x) and g(x) be polynomials in Z4 [x]. Then f (x) and g(x) are coprime
if and only if µ( f (x)) and µ(g(x)) are coprime polynomials in F2 [x].
Proof: Suppose that f (x) and g(x) are coprime. By Exercise 708, a(x) f (x) + b(x)g(x) = 1
for some a(x) and b(x) in Z4 [x]. Then µ(a(x))µ( f (x)) + µ(b(x))µ(g(x)) = µ(1) = 1,
implying that µ( f (x)) and µ(g(x)) are coprime. Conversely, suppose that µ( f (x)) and
µ(g(x)) are coprime. Then there exist a(x) and b(x) in Z4 [x] such that µ(a(x))µ( f (x)) +
µ(b(x))µ(g(x)) = 1. Thus a(x) f (x) + b(x)g(x) = 1 + 2s(x) for some s(x) ∈ Z4 [x]. But
then a(x)(1 + 2s(x)) f (x) + b(x)(1 + 2s(x))g(x) = (1 + 2s(x))2 = 1 showing that f (x)
and g(x) are coprime by Exercise 708.
The following, which is a special case of a result called Hensel’s Lemma, shows how to
get from a factorization of µ( f (x)) to a factorization of f (x).
Theorem 12.3.4 (Hensel’s Lemma) Let f (x) ∈ Z4 [x]. Suppose µ( f (x)) = h 1 (x)h 2 (x) · · ·
h k (x), where h 1 (x), h 2 (x), . . . , h k (x) are pairwise coprime polynomials in F2 [x]. Then there
exist g1 (x), g2 (x), . . . , gk (x) in Z4 [x] such that:
(i) µ(gi (x)) = h i (x) for 1 ≤ i ≤ k,
(ii) g1 (x), g2 (x), . . . , gk (x) are pairwise coprime, and
(iii) f (x) = g1 (x)g2 (x) · · · gk (x).
Proof: The proof is by induction on k. Suppose k = 2. Choose g1′ (x) and g2′ (x) in Z4 [x]
so that µ(g1′ (x)) = h 1 (x) and µ(g2′ (x)) = h 2 (x). So f (x) = g1′ (x)g2′ (x) + 2s(x) for some
s(x) ∈ Z4 [x]. As h 1 (x) and h 2 (x) are coprime, so are g1′ (x) and g2′ (x) by Lemma 12.3.3. Thus
by Exercise 708 there are polynomials ai (x) ∈ Z4 [x] such that a1 (x)g1′ (x) + a2 (x)g2′ (x) =
1. Let g1 (x) = g1′ (x) + 2a2 (x)s(x) and g2 (x) = g2′ (x) + 2a1 (x)s(x). Then (i) and (ii)
hold as µ(gi (x)) = µ(gi′ (x)) = h i (x). Also g1 (x)g2 (x) = g1′ (x)g2′ (x) + 2(a1 (x)g1′ (x) +
a2 (x)g2′ (x))s(x) = g1′ (x)g2′ (x) + 2s(x) = f (x).
Now suppose k = 3. By Exercise 709, h 1 (x) and h 2 (x)h 3 (x) are coprime. Thus using
the case k = 2, there exist coprime polynomials g1 (x) and g23 (x) in Z4 [x] such that
µ(g1 (x)) = h 1 (x), µ(g23 (x)) = h 2 (x)h 3 (x), and f (x) = g1 (x)g23 (x). Since h 2 (x) and h 3 (x)
are coprime, again using the case k = 2, there are coprime polynomials g2 (x) and g3 (x)
478
Codes over Z4
such that µ(g2 (x)) = h 2 (x), µ(g3 (x)) = h 3 (x), and g23 (x) = g2 (x)g3 (x). This completes
the case k = 3. Continuing inductively gives the result for all k.
Exercise 710 Let a(x) ∈ Z4 [x]. Prove that the following are equivalent:
(a) a(x) is invertible in Z4 [x].
(b) a(x) = 1 + 2s(x) for some s(x) ∈ Z4 [x].
(c) µ(a(x)) = 1. (Note that 1 is the only unit in F2 [x].)
Corollary 12.3.5 Let f (x) ∈ Z4 [x]. Suppose that all the roots of µ( f (x)) are distinct. Then
f (x) is irreducible if and only if µ( f (x)) is irreducible.
Proof: If µ( f (x)) is reducible, it factors into irreducible polynomials that are pairwise
coprime as µ( f (x)) has distinct roots. By Hensel’s Lemma, f (x) has a nontrivial factorization into basic irreducibles, which cannot be units by Exercise 710, implying that f (x)
is reducible. Suppose that f (x) = g(x)h(x), where neither g(x) nor h(x) are units. Then
µ( f (x)) = µ( f (x))µ(g(x)) and neither µ( f (x)) nor µ(g(x)) are units by Exercise 710.
After presenting one final result without proof, we will be ready to discuss the factorization of x n − 1 ∈ Z4 [x]. This result, which is a special case of [230, Theorem XIII.6], applies
to regular polynomials in Z4 [x]. A regular polynomial in Z4 [x] is any nonzero polynomial
that is not a divisor of zero.2
Lemma 12.3.6 Let f (x) be a regular polynomial in Z4 [x]. Then there exist a monic polynomial f ′ (x) and a unit u(x) in Z4 [x] such that µ( f ′ (x)) = µ( f (x)) and f ′ (x) = u(x) f (x).
Since F2 [x] is a unique factorization domain, µ(x n − 1) = x n + 1 ∈ F2 [x] has a factorization h 1 (x)h 2 (x) · · · h k (x) into irreducible polynomials. These are pairwise coprime
only if n is odd. So assuming that n is odd, by Hensel’s Lemma, there is a factorization x n − 1 = g1 (x)g2 (x) · · · gk (x) into pairwise coprime basic irreducible polynomials
gi (x) ∈ Z4 [x] such that µ(gi (x)) = h i (x). By Lemma 12.3.2, each gi (x) is a primary polynomial. The polynomial x n − 1 is not a divisor of zero in Z4 [x] and so is regular. Using the
Factorization Theorem [230], which applies to regular polynomials that are factored into
products of primary polynomials, this factorization of x n − 1 into basic irreducible polynomials is unique up to multiplication by units. Furthermore, by Corollary 12.3.5, these basic
irreducible polynomials are in fact irreducible. These gi (x) are also regular. Lemma 12.3.6
implies that since gi (x) is regular, there is a monic polynomial gi′ (x) ∈ Z4 [x] such that
µ(gi′ (x)) = µ(gi (x)) and gi′ (x) = u i (x)gi (x) for some unit u i (x) ∈ Z4 [x]. Each gi′ (x) is irreducible. Exercise 711 shows that x n − 1 = g1′ (x)g2′ (x) · · · gk′ (x); hence we have shown
that when n is odd, x n − 1 can be factored into a unique product of monic irreducible
polynomials in Z4 [x]. Example 12.3.1 and Exercise 707 indicate why we cannot drop the
condition that n be odd. This gives our starting point for discussing cyclic codes.
Theorem 12.3.7 Let n be odd. Then x n − 1 = g1 (x)g2 (x) · · · gk (x) where gi (x) ∈ Z4 [x] are
unique monic irreducible (and basic irreducible) pairwise coprime polynomials in Z4 [x].
2
A nonzero element a in a ring R is a divisor of zero provided ab = 0 for some nonzero element b ∈ R.
479
12.3 Cyclic codes over Z4
Furthermore, x n + 1 = µ(g1 (x))µ(g2 (x)) · · · µ(gk (x)) is a factorization into irreducible
polynomials in F2 [x].
Exercise 711 For n odd, x n − 1 = g1 (x)g2 (x) · · · gk (x) where gi (x) are irreducible,
and hence regular. By Lemma 12.3.6, x n − 1 = a(x)g1′ (x)g2′ (x) · · · gk′ (x), where gi′ (x) are
monic, µ(gi′ (x)) = µ(gi (x)), and a(x) is a unit. Show that a(x) = 1. Hint: See Exercise 710.
Exercise 712 This illustrates Lemma 12.3.6. Let gi (x) = 2x + 1.
(a) What is µ(gi (x))?
(b) Show that gi′ (x) = 1 satisfies both µ(gi′ (x)) = µ(gi (x)) and gi′ (x) = u i (x)gi (x) for the
unit u i (x) = 2x + 1.
In order to factor x n − 1 in Z4 [x], we first factor x n + 1 in F2 [x] and then use Hensel’s
Lemma. The proof of Hensel’s Lemma then gives a method for computing the factorization
of x n − 1 in Z4 [x]. However, that method is too tedious. We introduce, without proof,
another method that will produce the factorization; this method is due to Graeffe [334] but
was adapted to Z4 -linear codes in [318, Section 4 and Theorem 2]:
I. Let h(x) be an irreducible factor of x n + 1 in F2 [x]. Write h(x) = e(x) + o(x), where
e(x) is the sum of the terms of h(x) with even exponents and o(x) is the sum of the terms
of h(x) with odd exponents.
II. Then g(x) is the irreducible factor of x n − 1 in Z4 [x] with µ(g(x)) = h(x), where
g(x 2 ) = ±(e(x)2 − o(x)2 ).
Example 12.3.8 In F2 [x], x 7 + 1 = (x + 1)(x 3 + x + 1)(x 3 + x 2 + 1) is the factorization
of x 7 + 1 into irreducible polynomials. We apply Graeffe’s method to each factor to obtain
the factorization of x 7 − 1 into monic irreducible polynomials of Z4 [x].
r If h(x) = x + 1, then e(x) = 1 and o(x) = x. So g(x 2 ) = −(1 − x 2 ) = x 2 − 1 and thus
g(x) = x − 1.
r If h(x) = x 3 + x + 1, then e(x) = 1 and o(x) = x 3 + x. So g(x 2 ) = −(1 − (x 3 + x)2 ) =
x 6 + 2x 4 + x 2 − 1 and thus g(x) = x 3 + 2x 2 + x − 1.
r If h(x) = x 3 + x 2 + 1, then e(x) = x 2 + 1 and o(x) = x 3 . So g(x 2 ) = −((x 2 + 1)2 −
(x 3 )2 ) = x 6 − x 4 + 2x 2 − 1 and thus g(x) = x 3 − x 2 + 2x − 1.
Therefore x 7 − 1 = (x − 1)(x 3 + 2x 2 + x − 1)(x 3 − x 2 + 2x − 1) is the factorization of
x 7 − 1 into monic irreducible polynomials in Z4 [x].
Exercise 713 Do the following:
(a) Verify that x 9 + 1 = (x + 1)(x 2 + x + 1)(x 6 + x 3 + 1) is the factorization of x 9 + 1
into irreducible polynomials in F2 [x].
(b) Apply Graeffe’s method to find the factorization of x 9 − 1 into monic irreducible poly
nomials in Z4 [x].
Exercise 714 Do the following:
(a) Verify that x 15 + 1 = (x + 1)(x 2 + x + 1)(x 4 + x 3 + 1)(x 4 + x + 1)(x 4 + x 3 + x 2 +
x + 1) is the factorization of x 15 + 1 into irreducible polynomials in F2 [x].
(b) Apply Graeffe’s method to find the factorization of x 15 − 1 into monic irreducible
polynomials in Z4 [x].
480
Codes over Z4
Exercise 715 Do the following:
(a) Suppose that n is odd and the factorization of x n + 1 into irreducible polynomials in
F2 [x] is x n + 1 = (x + 1)(x n−1 + x n−2 + · · · + x + 1). Show that the factorization of
x n − 1 into monic irreducible polynomials in Z4 [x] is x n − 1 = (x − 1)(x n−1 + x n−2 +
· · · + x + 1). Note that this does not require the use of Graeffe’s method.
(b) Show that when n = 3, 5, 11, 13, and 19, the factorization of x n − 1 in Z4 [x] is given
by (a).
12.3.2 The ring Rn = Z4 [x]/(x n − 1)
To study cyclic codes over Fq we needed to find the ideals of Fq [x]/(x n − 1). Similarly, we
need to find the ideals of Rn in order to study cyclic codes over Z4 . We first need to know
the ideal structure of Z4 [x]/( f (x)), where f (x) is a basic irreducible polynomial.
Lemma 12.3.9 If f (x) ∈ Z4 [x] is a basic irreducible polynomial, then R = Z4 [x]/( f (x))
has only three ideals: (0), (2) = {2s(x) + ( f (x)) | s(x) ∈ Z4 [x]}, and (1) = R.
Proof: Suppose I is a nonzero ideal in R. Let g(x) + ( f (x)) ∈ I with g(x) ∈ ( f (x)). As
µ( f (x)) is irreducible, gcd(µ(g(x)), µ( f (x))) is either 1 or µ( f (x)). Arguing as in the proof
of Lemma 12.3.2, either a(x)g(x) + b(x) f (x) = 1 + 2s(x) or g(x) = f (x)a(x) + 2s(x) for
some a(x), b(x), and s(x) in Z4 [x]. In the former case, a(x)g(x)(1 + 2s(x)) + b(x) f (x)(1 +
2s(x)) = (1 + 2s(x))2 = 1, implying that g(x) + ( f (x)) has an inverse a(x)(1 + 2s(x)) +
( f (x)) in R. But this means that I contains an invertible element and hence equals
R by Exercise 716. In the latter case, g(x) + ( f (x)) ∈ (2), and hence we may assume
that I ⊆ (2). As I = {0}, there exists h(x) ∈ Z4 [x] such that 2h(x) + ( f (x)) ∈ I with
2h(x) ∈ ( f (x)). Assume that µ(h(x)) ∈ (µ( f (x))); then h(x) = f (x)a(x) + 2s(x) for some
a(x) and s(x) in Z4 [x] implying 2h(x) = 2 f (x)a(x) ∈ ( f (x)), a contradiction. Hence
as µ(h(x)) ∈ (µ( f (x))) and µ( f (x)) is irreducible, gcd(µ(h(x)), µ( f (x))) = 1. Again,
a(x)h(x) + b(x) f (x) = 1 + 2s(x) for some a(x), b(x), and s(x) in Z4 [x]. Therefore
a(x)(2h(x)) + 2b(x) f (x) = 2 showing that 2 + ( f (x)) ∈ I and so (2) ⊆ I. Thus I = (2).
Exercise 716 Prove that if I is an ideal of a ring R and I contains an invertible element,
then I = R.
We do not have the Division Algorithm in Z4 [x] and so the next result is not as obvious
as the corresponding result in Fq [x].
Lemma 12.3.10 Let m(x) be a monic polynomial of degree r in Z4 [x]. Then Z4 [x]/(m(x))
has 4r elements, and every element of Z4 [x]/(m(x)) is uniquely expressible in the form
a(x) + (m(x)), where a(x) is the zero polynomial or has degree less than r .
Proof: Let a(x) = ad x d + ad−1 x d−1 + · · · + a0 ∈ Z4 [x]. If d ≥ r , then b(x) = a(x) −
ad x d−r m(x) has degree less than d and a(x) + (m(x)) = b(x) + (m(x)). Hence every element a(x) + (m(x)) of Z4 [x]/(m(x)) has a representative of degree less than r . Furthermore,
if b1 (x) + (m(x)) = b2 (x) + (m(x)) with bi (x) each of degree less than r , then b1 (x) − b2 (x)
is a multiple of m(x). By Exercise 717, b1 (x) = b2 (x). The result follows.
481
12.3 Cyclic codes over Z4
Exercise 717 Let m(x) be a monic polynomial of degree r and h(x) a nonzero polynomial
of degree s, where m(x) and h(x) are in Z4 [x]. Prove that h(x)m(x) has degree r + s.
After one more lemma we are ready to give the ideal structure of Rn . It is these ideals that
produce cyclic codes. A principal ideal of Rn is generated by an element g(x) + (x n − 1);
to avoid confusion with the notation for ideals of Z4 [x], we denote this principal ideal by
g(x). (Note that we should actually use g(x) + (x n − 1), but we choose to drop this
more cumbersome notation. Similarly, we will often say that a polynomial a(x) is in Rn
rather than the more accurate expression that the coset a(x) + (x n − 1) is in Rn .)
Lemma 12.3.11 Let n be odd. Suppose m(x) is a monic polynomial of degree r which is a
product of distinct irreducible factors of x n − 1 in Z4 [x]. Then the ideal m(x) of Rn has
4n−r elements and every element of m(x) is uniquely expressible in the form m(x)a(x),
where a(x) is the zero polynomial or has degree less than n − r .
Proof: By Theorem 12.3.7 and Exercise 717, there exists a monic polynomial h(x)
of degree n − r such that m(x)h(x) = x n − 1. Every element f (x) of m(x) has the
form m(x)a(x). Let a(x) be chosen to be of smallest degree such that f (x) = m(x)a(x)
in Rn . If a(x) = ad x d + ad−1 x d−1 + · · · + a0 with ad = 0 and d ≥ n − r , then b(x) =
a(x) − ad x d−n+r h(x) has degree less than d. So m(x)b(x) = m(x)a(x) − ad x d−n+r (x n − 1)
in Z4 [x] implying that f (x) = m(x)a(x) = m(x)b(x) in Rn , contradicting the choice of
a(x). Thus every element of m(x) has the form m(x)a(x) where a(x) has degree less than
n − r . Furthermore, if m(x)a1 (x) = m(x)a2 (x) in Rn where each ai (x) has degree less than
n − r , then m(x)(a1 (x) − a2 (x)) has degree less than n in Z4 [x] and is a multiple of x n − 1,
contradicting Exercise 717 unless a1 (x) = a2 (x).
Theorem 12.3.12 Let n be odd and let x n − 1 = g1 (x)g2 (x) · · · gk (x) be a factorization
of x n − 1 into pairwise coprime monic irreducible polynomials in Z4 [x]. Let
gi (x) =
+
g
(x).
Suppose
that
g
(x)
has
degree
d
.
The
following
hold:
j
i
i
j=i
(i) Rn has 4n elements.
gi (x).
(ii) If 1 ≤ i ≤ k, Rn = gi (x) ⊕
g1 (x) ⊕
g2 (x) ⊕ · · · ⊕
gk (x).
(iii) Rn =
(iv) If 1 ≤ i ≤ k,
gi (x) =
ei (x) where {
ei (x) | 1 ≤ i ≤ k} are pairwise orthogonal
k
idempotents of Rn , and i=1
ei (x) = 1.
gi (x) has 4di elements.
(v) If 1 ≤ i ≤ k,
gi (x) ≃ Z4 [x]/(gi (x)) and
gi (x)s and 2
gi (x)s.
(vi) Every ideal of Rn is a direct sum of
Proof: Part (i) follows from Lemma 12.3.10. By Exercise 709, (gi (x)) + (
gi (x)) = Z4 [x].
Thus gi (x) +
gi (x) = Rn . By Lemma 12.3.11, gi (x) and
gi (x) have sizes 4n−di and
gi (x) = Rn must be direct, proving (ii).
4di , respectively. Thus by (i) the sum gi (x) +
To prove (iii) note that for 1 ≤ j ≤ k
C
gi (x)
g j (x) ⊆ g j (x) ∩
g j (x) = {0};
i= j
the containment follows because each
gi (x) with i = j is a multiple of g j (x), and the
equality follows from (ii). Thus the sum
g1 (x) +
g2 (x) + · · · +
gk (x) is direct. Since
482
Codes over Z4
gi (x) has 4di elements by Lemma 12.3.11 and d1 + d2 + · · · + dk = n, (i) implies (iii).
By part (iii)
k
1=
i=1
ei (x)
in Rn ,
(12.10)
where
ei (x) ∈
gi (x). As
ei (x)
e j (x) ∈
gi (x) ∩
g j (x) = {0} if i = j, multiplying
ei (x) | 1 ≤ i ≤ k} are pairwise orthogonal idempotents of
(12.10) by
ei (x) shows that {
gi (x) shows that
gi (x) =
gi (x)
ei (x), implying that
gi (x) =
Rn . Multiplying (12.10) by
ei (x), verifying (iv). We leave part (v) as Exercise 718.
Let I be an ideal of Rn . If a(x) ∈ I, multiplying (12.10) by a(x) shows that
k
k
I ∩
gi (x). By (v) and
a(x)
ei (x). Since a(x)
ei (x) ∈
gi (x), I = i=1
a(x) = i=1
Lemma 12.3.9, part (vi) follows.
Exercise 718 Use the notation of Theorem 12.3.12.
gi (x) =
ei (x) given by φ(a(x) + (gi (x))) =
(a) Prove that φ : Z4 [x]/(gi (x)) →
a(x)
ei (x) is a well-defined ring homomorphism.
ei (x) along with
(b) Prove that φ is one-to-one and onto. Hint: Use the fact that
gi (x) =
Lemmas 12.3.10 and 12.3.11.
12.3.3 Generating polynomials of cyclic codes over Z4
When n is odd we can give a pair of polynomials that “generate” any cyclic code of length
n over Z4 . The following result was proved in [279, 289].
Theorem 12.3.13 Let C be a cyclic code over Z4 of odd length n. Then there exist
unique monic polynomials f (x), g(x), and h(x) such that x n − 1 = f (x)g(x)h(x) and
C = f (x)g(x) ⊕ 2 f (x)h(x). Furthermore, C has type 4deg h 2deg g .
Proof: Using the notation of Theorem 12.3.12, C is the direct sum of some
gi (x)s
and some 2
gi (x)s where x n − 1 = g1 (x)g2 (x) · · · gk (x). Rearrange the gi (x) so that
b
+k
+b
a
gi (x) ⊕ i=a+1
2
gi (x). Let f (x) = i=b+1
gi (x), g(x) = i=a+1
gi (x), and
C = i=1
+a
h(x) = i=1
gi (x). Then x n − 1 = f (x)g(x)h(x). Since
gi (x) is a multiple of f (x)g(x)
a
gi (x) ⊆ f (x)g(x). By Lemma 12.3.11, f (x)g(x) has
for 1 ≤ i ≤ a, we have i=1
a
gi (x) by Theorem 12.3.12(iii)
size 4n−deg( f g) = 4deg h , which is also the size of i=1
a
gi (x) = f (x)g(x). Using the same argument and Exercise 719,
and (v). Hence i=1
b
gi (x) = 2 f (x)h(x). The uniqueness follows since any monic polynomial faci=a+1 2
n
tor of x − 1 must factor into a product of basic irreducibles that must then be a product of
some of the gi (x)s that are unique by Theorem 12.3.7. The type of C follows from the fact
that f (x)g(x) has size 4deg h and 2 f (x)h(x) has size 2deg g .
Exercise 719 Let n be odd and m(x) a monic product of irreducible factors of x n − 1 in
Z4 [x] of degree r . Prove that the ideal 2m(x) of Rn has 2n−r elements and every element
of 2m(x) is uniquely expressible in the form 2m(x)a(x), where a(x) has degree less than
n − r with coefficients 0 and 1 only.
483
12.3 Cyclic codes over Z4
Corollary 12.3.14 With the notation as in Theorem 12.3.13, if g(x) = 1, then C = f (x),
and C has type 4n−deg f. If h(x) = 1, then C = 2 f (x), and C has type 2n−deg f .
Exercise 720 Prove Corollary 12.3.14.
Corollary 12.3.15 Let n be odd. Assume that x n − 1 is a product of k irreducible polynomials in Z4 [x]. Then there are 3k cyclic codes over Z4 of length n.
Proof: Let x n − 1 = g1 (x)g2 (x) · · · gk (x) be the factorization of x n − 1 into monic irreducible polynomials. If C is a cyclic code, by Theorem 12.3.13, C = f (x)g(x) ⊕
2 f (x)h(x) where x n − 1 = f (x)g(x)h(x). Each gi (x) is a factor of exactly one of f (x),
g(x), or h(x). The result follows.
Example 12.3.16 By Example 12.3.8, x 7 − 1 = g1 (x)g2 (x)g3 (x), where g1 (x) = x − 1,
g2 (x) = x 3 + 2x 2 + x − 1, and g3 (x) = x 3 − x 2 + 2x − 1 are the monic irreducible factors
of x 7 − 1. By Corollary 12.3.15, there are 33 = 27 cyclic codes over Z4 of length 7. In
Table 12.1 we give the generator polynomials of the 25 nontrivial cyclic codes of length
7 as described in Theorem 12.3.13. Each generator is given as a product of some of
the gi (x)s. If a single generator a is given, then the code is a(x); these are the codes
that come from Corollary 12.3.14. If a pair (a, 2b) of generators is given, then the code
is a(x) ⊕ 2b(x). The information on idempotent generators and duality is discussed
later.
Exercise 721 Verify that all the nontrivial cyclic codes of length 7 over Z4 have been
accounted for in Table 12.1, and show that the codes with given generator polynomials have
the indicated types.
If C = f (x)g(x) ⊕ 2 f (x)h(x) as in Theorem 12.3.13, we can easily write down a
generator matrix G for C. The first deg h rows of G correspond to x i f (x)g(x) for 0 ≤ i ≤
deg h − 1. The last deg g rows of G correspond to 2x i f (x)h(x) for 0 ≤ i ≤ deg g − 1.
Example 12.3.17 Consider code No. 22 in Table 12.1. Since g1 (x)g3 (x) = 1 + x + 3x 2 +
2x 3 + x 4 and 2g2 (x) = 2 + 2x + 2x 3 , one generator matrix for this code is
1 1 3 2 1 0 0
0 1 1 3 2 1 0
0 0 1 1 3 2 1
2 2 0 2 0 0 0 .
0 2 2 0 2 0 0
0 0 2 2 0 2 0
0 0 0 2 2 0 2
As with cyclic codes over Fq , we can find generator polynomials of the dual codes. Let
f (x) = ad x d + ad−1 x d−1 + · · · + a0 ∈ Z4 [x] with ad = 0. Define the reciprocal polynomial f ∗ (x) to be
f ∗ (x) = ±x d f (x −1 ) = ±(a0 x d + a1 x d−1 + · · · + ad ).
484
Codes over Z4
Table 12.1 Generators of cyclic codes over Z4 of length 7
Code
number
Generator
polynomials
“Generating
idempotents”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
g2 g3
g1 g2
g1 g3
g2
g3
g1
2g2 g3
2g1 g2
2g1 g3
2g2
2g3
2g1
2
(g2 g3 , 2g2 g1 )
(g3 g2 , 2g3 g1 )
(g2 g3 , 2g1 )
(g2 g1 , 2g2 g3 )
(g3 g1 , 2g3 g2 )
(g1 g2 , 2g1 g3 )
(g1 g3 , 2g1 g2 )
(g1 g2 , 2g3 )
(g1 g3 , 2g2 )
(g2 , 2g1 g3 )
(g3 , 2g1 g2 )
(g1 , 2g2 g3 )
e1
e3
e2
e3
e1 +
e2
e1 +
e2 +
e3
2
e1
2
e3
2
e2
2
e1 + 2
e3
2
e1 + 2
e2
2
e2 + 2
e3
2
e1 + 2
e3
e1 + 2
e2
e1 + 2
e2 + 2
e3
e3 + 2
e1
e1
e2 + 2
e3 + 2
e2
e2 + 2
e3
e3 + 2
e1 + 2
e2
e2 + 2
e1 + 2
e3
e1 +
e3 + 2
e2
e1 +
e2 + 2
e3
e2 +
e3 + 2
e1
Type
Dual
code
4
43
43
44
44
46
2
23
23
24
24
26
27
4 · 23
4 · 23
4 · 26
43 2
43 2
43 23
43 23
43 24
43 24
44 23
44 23
46 2
6
4
5
2
3
1
25
23
24
21
22
16
self-dual
19
20
12
self-dual
self-dual
14
15
10
11
8
9
7
We choose the ± sign so that the leading coefficient of f ∗ (x) is 1 or 2. The proof of
Lemma 4.4.8 also proves the following lemma.
Lemma 12.3.18 Let a = a0 a1 · · · an−1 and b = b0 b1 · · · bn−1 be vectors in Zn4 with associated polynomials a(x) and b(x). Then a is orthogonal to b and all its shifts if and only if
a(x)b∗ (x) = 0 in Rn .
We observe that if n is odd and gi (x) is a monic irreducible factor of x n − 1 in Z4 [x],
then gi∗ (x) is also a monic irreducible factor of x n − 1 as the constant term of gi (x) is ±1.
The reciprocal polynomial µ(gi∗ (x)) of µ(gi (x)) in F2 [x] is an irreducible factor of x n + 1
in F2 [x]. Thus gi∗ (x) is a basic irreducible factor of x n − 1. By the unique factorization of
x n − 1 from Theorem 12.3.7, gi∗ (x) = g j (x) for some j since gi∗ (x) is monic.
Example 12.3.19 In the notation of Example 12.3.16, g1∗ (x) = g1 (x), g2∗ (x) = g3 (x), and
g3∗ (x) = g2 (x).
485
12.3 Cyclic codes over Z4
Theorem 12.3.20 If C = f (x)g(x) ⊕ 2 f (x)h(x) is a cyclic code of odd length n over
Z4 with f (x)g(x)h(x) = x n − 1, then C ⊥ = h ∗ (x)g ∗ (x) ⊕ 2h ∗ (x) f ∗ (x). Furthermore,
if g(x) = 1, then C = f (x) and C ⊥ = h ∗ (x); if h(x) = 1, then C = 2 f (x) and C ⊥ =
g ∗ (x) ⊕ 2 f ∗ (x).
Proof: Let s(x) = h ∗ (x)g ∗ (x); then s ∗ (x) = ± h(x)g(x) and f (x)g(x)s ∗ (x) = ± (x n − 1)×
g(x) in Z4 [x], implying f (x)g(x)s ∗ (x) = 0 in Rn . In the same way 2 f (x)h(x)s ∗ (x) = 0.
Thus h ∗ (x)g ∗ (x) ∈ C ⊥ by Lemma 12.3.18. Likewise 2h ∗ (x) f ∗ (x) is orthogonal to f (x)g(x)
and its cyclic shifts. As 2a(x)2b∗ (x) = 0 in Rn for all a(x) and b(x) in Z4 [x], 2h ∗ (x) f ∗ (x)
is orthogonal to 2 f (x)h(x) and its cyclic shifts. Thus 2h ∗ (x) f ∗ (x) ∈ C ⊥ . As x n − 1 =
f ∗ (x)g ∗ (x)h ∗ (x), by Theorem 12.3.13, h ∗ (x)g ∗ (x) + 2h ∗ (x) f ∗ (x) is a direct sum. Thus
h ∗ (x)g ∗ (x) ⊕ 2h ∗ (x) f ∗ (x) ⊆ C ⊥ .
(12.11)
Since C has type 4deg h 2deg g , C ⊥ has type 4n−deg h−deg g 2deg g by (12.3). But h ∗ (x)g ∗ (x) ⊕
∗
∗
2h ∗ (x) f ∗ (x) has type 4deg f 2deg g = 4deg f 2deg g . As these types agree, we have equality
in (12.11) proving the first statement. The second statement follows from the first by direct computation as g(x) = 1 implies g ∗ (x) = 1 and h ∗ (x) f ∗ (x) = x n − 1, while h(x) = 1
implies h ∗ (x) = 1.
Example 12.3.21 Code No. 14 in Table 12.1 has f (x) = g2 (x), g(x) = g3 (x), and h(x) =
g1 (x). Hence its dual is g1∗ (x)g3∗ (x) ⊕ 2g1∗ (x)g2∗ (x) = g1 (x)g2 (x) ⊕ 2g1 (x)g3 (x),
which is code No. 19. Code No. 22 has f (x) = 1, g(x) = g1 (x)g3 (x), and h(x) = g2 (x). So
its dual is g2∗ (x)g1∗ (x)g3∗ (x) ⊕ 2g2∗ (x) = 2g3 (x), which is code No. 11. Code No. 17 has
f (x) = g2 (x), g(x) = g1 (x), and h(x) = g3 (x). Its dual is g3∗ (x)g1∗ (x) ⊕ 2g3∗ (x)g2∗ (x) =
g2 (x)g1 (x) ⊕ 2g2 (x)g3 (x), making the code self-dual. There are three cyclic self-dual
codes of length 7.
Exercise 722 Verify that the duals of the cyclic codes of length 7 over Z4 are as indicated
in Table 12.1.
12.3.4 Generating idempotents of cyclic codes over Z4
Let C be a nonzero cyclic code over Z4 of odd length n. By Theorem 12.3.13 there exist unique monic polynomials f (x), g(x), and h(x) such that x n − 1 = f (x)g(x)h(x) and
C = f (x)g(x) ⊕ 2 f (x)h(x). By Theorem 12.3.12(vi), f (x)g(x) and 2 f (x)h(x) can
gi (x), respectively,
be expressed as a direct sum of ideals of the forms
gi (x) and 2
where x n − 1 = g1 (x)g2 (x) · · · gk (x) is a factorization of x n − 1 into irreducible polynomi+
als in Z4 [x] and
gi (x) = j=i g j (x). Thus f (x)g(x) = i∈I
gi (x) and 2g(x)h(x) =
2
g
(x)
for
some
subsets
I
and
J
of
{1,
2,
.
.
.
,
k}.
Theorem
12.3.12(iv) shows
j
j∈J
ei (x), where
ei (x) is an idempotent of Rn . By Exercise 723(a) and (b),
that
gi (x) =
f (x)g(x) = e(x) and g(x)h(x) = E(x), where e(x) and E(x) are idempotents in Rn .
This proves the following theorem.
Theorem 12.3.22 Let C = f (x)g(x) ⊕ 2 f (x)h(x) be a nonzero cyclic code over Z4 of
odd length n where x n − 1 = f (x)g(x)h(x). Then:
486
Codes over Z4
(i) if g(x) = 1, C = f (x) = e(x),
(ii) if h(x) = 1, C = 2 f (x) = 2E(x), and
(iii) if g(x) = 1 and h(x) = 1, C = f (x)g(x) ⊕ 2 f (x)h(x) = e(x) ⊕ 2E(x),
where e(x) and E(x) are nonzero idempotents in Rn .
Exercise 723 Let I ⊆ {1, 2, . . . , k} and let I c = {1, 2, . . . , k} \ I be the complement of I .
Do the following:
ei (x) | 1 ≤ i ≤ k} are
ei (x) is an idempotent in Rn . Hint: Recall that {
(a) Prove that i∈I
pairwise orthogonal idempotents.
gi (x) = i∈I
e (x).
(b) Prove that i∈I
i
+
(c) Prove that j∈I c g j (x) = i∈I
ei (x). Hint: See the proof of Theorem 12.3.13.
By Theorem 12.3.22, C = e(x) ⊕ 2E(x), where e(x) and E(x) are idempotents. By
Exercise 724, C = e(x) + 2E(x). We call e(x) + 2E(x) the “generating idempotent” of C.
We will use quote marks because e(x) + 2E(x) is not really an idempotent (unless E(x) =
0); it is appropriate to use this term, however, because e(x) and E(x) are idempotents that
in combination generate C. We will use the term generating idempotent, without quotation
marks, when E(x) = 0 and hence C is actually generated by an idempotent (case (i) of
Theorem 12.3.22).
Exercise 724 Let C = e(x) ⊕ 2E(x), where e(x) and E(x) are idempotents. Show that
C = e(x) + 2E(x).
Recall that a multiplier µa defined on {0, 1, . . . , n − 1} by iµa ≡ ia (mod n) is a permutation of the coordinate positions {0, 1, . . . , n − 1} of a cyclic code of length n provided gcd(a, n) = 1. We can apply multipliers to Z4 -linear codes. As with cyclic codes
over Fq , µa acts on Rn by f (x)µa ≡ f (x a ) (mod x n − 1) for f (x) ∈ Rn . See (4.4). Also
( f (x)g(x))µa = ( f (x)µa )(g(x)µa ) for f (x) and g(x) in Rn . This implies that if e(x) is
an idempotent in Rn , so is e(x)µa . In some circumstances we can compute the generating
idempotents of the dual of a cyclic code over Z4 and the intersection and sum of two such
codes; compare this to Theorems 4.3.7 and 4.4.9.
Lemma 12.3.23 The following hold:
(i) Let C = e(x) be a Z4 -linear cyclic code with generating idempotent e(x). Then the
generating idempotent for C ⊥ is 1 − e(x)µ−1 .
(ii) For i = 1 and 2 let C i = ei (x) be Z4 -linear cyclic codes with generating idempotents
ei (x). Then the generating idempotent for C 1 ∩ C 2 is e1 (x)e2 (x), and the generating
idempotent for C 1 + C 2 is e1 (x) + e2 (x) − e1 (x)e2 (x).
Proof: Since e(x)(1 − e(x)) = 0 in Rn , e(x) is orthogonal to the reciprocal polynomial of
1 − e(x) and all of its shifts by Lemma 12.3.18; but these are merely scalar multiples of the
cyclic shifts of 1 − e(x)µ−1 . Thus 1 − e(x)µ−1 ∈ C ⊥ . In the notation of Theorem 12.3.13,
C has generator polynomial f (x) where g(x) = 1 and f (x)h(x) = x n − 1 because C has
a generating idempotent (as opposed to being generated by an idempotent plus twice an
idempotent). By Theorem 12.3.20, C ⊥ has generator polynomial h ∗ (x); however, 1 − e(x)
487
12.3 Cyclic codes over Z4
is the generating idempotent of h(x) by Theorem 12.3.12. Hence 1 − e(x)µ−1 is the
generating idempotent of h ∗ (x) = C ⊥ .
For (ii), clearly, e1 (x)e2 (x) ∈ C 1 ∩ C 2 . Thus e1 (x)e2 (x) ⊆ C 1 ∩ C 2 . If c(x) ∈ C 1 ∩ C 2 ,
then c(x) = s(x)e1 (x) and c(x)e2 (x) = c(x), the latter by Exercise 725(a); thus c(x) =
c(x)e2 (x) = s(x)e1 (x)e2 (x) ∈ e1 (x)e2 (x) and C 1 ∩ C 2 = e1 (x)e2 (x). Next if c(x) =
c1 (x) + c2 (x) where ci (x) ∈ C i for i = 1 and 2, then by Exercise 725(a), c(x)(e1 (x) +
e2 (x) − e1 (x)e2 (x)) = c1 (x) + c1 (x)e2 (x) − c1 (x)e2 (x) + c2 (x)e1 (x) + c2 (x) − c2 (x)e1 (x) =
c(x). Since e1 (x) + e2 (x) − e1 (x)e2 (x) is clearly in C 1 + C 2 , C 1 + C 2 = e1 (x) + e2 (x) −
e1 (x)e2 (x) by Exercise 725(b).
Exercise 725 Let C be a cyclic code over either Fq or Z4 . Do the following:
(a) Let e(x) be a generating idempotent of C. Prove that c(x) ∈ C if and only if c(x)e(x) =
c(x).
(b) Prove that if e(x) ∈ C and c(x)e(x) = c(x) for all c(x) ∈ C, then e(x) is the generating
idempotent of C.
One way to construct the “generating idempotent” of C is to begin with the primitive
idempotents. We leave the proof of the following as an exercise.
Theorem 12.3.24 Let n be odd and g1 (x)g2 (x) · · · gk (x) be a factorization of x n − 1 into
+
gi (x) = j=i g j (x) and
gi (x) =
ei (x), where
irreducible polynomials in Z4 [x]. Let
ei (x) is an idempotent of Rn . Let C = f (x)g(x) ⊕ 2 f (x)h(x) where f (x)g(x)h(x) =
+
+
x n − 1. Finally, suppose that f (x)g(x) = i∈I gi (x) and f (x)h(x) = j∈J g j (x) for
subsets I and J of {1, 2, . . . , k}. Then:
(i) I c ∩ J c = ∅, where I c and J c are the complements of I and J, respectively, in
{1, 2, . . . , k}, and
ei (x) and E(x) = j∈J c
e j (x).
(ii) C = e(x) + 2E(x), where e(x) = i∈I c
Exercise 726 Fill in the details of the proof of Theorem 12.3.24.
There is a method for constructing idempotents, including the primitive ones, in Rn found
in [27, 289]. The method begins with binary idempotents in Rn ; these can be obtained from
generator polynomials using the Euclidean Algorithm as given in the proof of Theorem 4.3.2.
Theorem 12.3.25 Let b(x) = µ( f (x)) where f (x)g(x) = x n − 1 in Z4 [x], b(x) is a
binary idempotent in Rn = F2 [x]/(x n + 1), and n is odd. Let e(x) = b2 (x) where b2 (x) is
computed in Z4 [x]. Then e(x) is the generating idempotent for f (x).
Proof: As b2 (x) = b(x) in Rn , µ(e(x)) = µ(b2 (x)) = b(x) + µ(a(x)(x n − 1)) for some
a(x) ∈ Z4 [x], implying that e(x) = b(x) + a(x)(x n − 1) + 2s(x) for some s(x) ∈ Z4 [x].
Squaring and simplifying yields e2 (x) = b2 (x) + d(x)(x n − 1) in Z4 [x] or e2 (x) = e(x) in
Rn . Thus e(x) is an idempotent in Rn .
By Theorem 4.4.6, 1 + b(x) is the generating idempotent in Rn of µ(g(x)). So 1 +
b(x) = r (x)g(x) + 2s(x) for some r (x) and s(x) in Z4 [x]. Hence b(x) = 1 + r (x)g(x) +
2(1 + s(x)); squaring this, we obtain e(x) = b2 (x) = 1 + t(x)g(x) for some t(x) in Z4 [x].
So e(x) f (x) = f (x) + t(x)(x n − 1) in Z4 [x] implying f (x) ∈ e(x) in Rn or f (x) ⊆
e(x).
488
Codes over Z4
Since e(x) = b2 (x), µ(e(x)) = µ(b2 (x)) ∈ µ( f (x)) in Rn . So e(x) = u(x) f (x) +
2v(x) in Z4 [x]. Squaring yields e2 (x) = u 2 (x) f 2 (x), implying that e2 (x) ∈ f (x) in Rn . As
e(x) is an idempotent in Rn , e(x) ∈ f (x) or e(x) ⊆ f (x). Therefore e(x) = f (x)
and e(x) = b2 (x) is the generating idempotent for f (x).
Example 12.3.26 In Example 12.3.8 we found the factorization of x 7 − 1 over Z4 [x]
to be x 7 − 1 = g1 (x)g2 (x)g3 (x), where g1 (x) = x − 1, g2 (x) = x 3 + 2x 2 + x − 1, and
gi (x) in R7
g3 (x) = x 3 − x 2 + 2x − 1. We wish to find the generating idempotents of
+
where
gi (x) = j=i g j (x). We follow the notation of Theorem 12.3.24 and the method of
Theorem 12.3.25. First, µ(
g1 (x)) = x 6 + x 5 + x 4 + x 3 + x 2 + x + 1. By Example 4.3.4,
g1 (x)). With
b1 (x) = x 6 + x 5 + x 4 + x 3 + x 2 + x + 1 is the generating idempotent of µ(
g3 (x)) =
analogous notation, µ(
g2 (x)) = x 4 + x 2 + x + 1, b2 (x) = x 4 + x 2 + x + 1, µ(
x 4 + x 3 + x 2 + 1, and b3 (x) = x 6 + x 5 + x 3 + 1. Thus by Theorem 12.3.25:
e1 (x) = −x 6 − x 5 − x 4 − x 3 − x 2 − x − 1,
e2 (x) = 2x 6 + 2x 5 − x 4 + 2x 3 − x 2 − x + 1,
e3 (x) = −x 6 − x 5 + 2x 4 − x 3 + 2x 2 + 2x + 1.
By Theorem 12.3.22 and Exercise 724, each cyclic code has a “generating idempotent”
e(x) + 2E(x). In Table 12.1, we present the “generating idempotents” for the 25 nontrivial
cyclic codes over Z4 of length 7. The idempotents e(x) and E(x) will each be a sum of
e2 (x), and
e3 (x). In the table we list e(x) + 2E(x). Code No. 4 is g2 (x); so
some of
e1 (x),
e1 (x) +
e3 (x) and 2E(x) =
f (x)g(x) = g2 (x) and f (x)h(x) = x n − 1 implying that e(x) =
n
0 by Theorem 12.3.24. Code No. 12 is 2g1 (x); so f (x)g(x) = x − 1 and f (x)h(x) =
e2 (x) + 2
e3 (x). Code No. 20 is g1 (x)g3 (x) +
g1 (x) implying that e(x) = 0 and 2E(x) = 2
2g1 (x)g2 (x); so f (x)g(x) = g1 (x)g3 (x) and f (x)h(x) = g1 (x)g2 (x) implying that e(x) =
e3 (x). Thus code No. 20 is the ideal generated by
e2 (x) + 2
e3 (x).
e2 (x) and 2E(x) = 2
Finally, code No. 21 is g1 (x)g2 (x) + 2g3 (x); so f (x)g(x) = g1 (x)g2 (x) and f (x)h(x) =
g3 (x) implying that e(x) =
e3 (x) and 2E(x) = 2
e1 (x) + 2
e2 (x). Therefore code No. 21 is
e1 (x) + 2
e2 (x).
the ideal generated by
e3 (x) + 2
Exercise 727 Do the following:
(a) Verify that the method described in Theorem 12.3.25 actually leads to the idempotents
ei (x) given in Example 12.3.26.
(b) Verify that the entries in the column “Generating idempotents” of Table 12.1 are correct
as given.
12.4
Quadratic residue codes over Z4
In Section 6.6 we examined the family of quadratic residue codes over Fq . Among the cyclic
codes over Z4 are codes that have properties similar to the ordinary quadratic residue codes
and are thus termed Z4 -quadratic residue codes [27, 279, 289]. We will give the generating
489
12.4 Quadratic residue codes over Z4
idempotents and basic properties of these codes. Before doing so we establish our notation
and present two preliminary lemmas.
Throughout this section, p will be a prime with p ≡ ±1 (mod 8). Let Q p denote the set
of nonzero quadratic residues modulo p, and let N p be the set of quadratic non-residues
modulo p.
Lemma 12.4.1 Define r by p = 8r ± 1. Let k ∈ F p with k = 0. Let NQ p (k) be the number
of unordered pairs {{i, j} | i + j = k, i = j, i and j in Q p }. Define NN p (k) analogously.
Then NQ p (k) = r − 1 if k ∈ Q p , NQ p (k) = r if k ∈ N p , NN p (k) = r − 1 if k ∈ N p , and
NN p (k) = r if k ∈ Q p .
Proof: We compute NQ p (k) first. By [66, p. 46] and Lemma 6.2.4, the number of pairs
(x, y) ∈ F2p with
x 2 + y2 = k
(12.12)
is 8r . Solutions of (12.12) lead to solutions of i + j = k with i, j ∈ Q p by setting {i, j} =
{x 2 , y 2 }. Three possibilities arise:
(a) x = 0 or y = 0,
(b) x = 0 and y = 0 with x = ±y, and
(c) x = 0 and y = 0 with x = ±y.
Solutions of (12.12) in either form (a) or (b) lead to solutions of i + j = k where i = 0,
j = 0, or i = j and hence are not counted in NQ p (k). When k ∈ N p , (a) cannot occur;
also (b) cannot occur because x 2 + y 2 = k and x = ±y implies 2x 2 = k, which means
2 ∈ N p , contradicting Lemma 6.2.5. So if k ∈ N p , all 8r solutions of (12.12) have form
(c). When k ∈ Q p , there are four solutions of form (a), namely (±γ , 0) and (0, ±γ ), where
±γ are the two solutions of z 2 = k. When k ∈ Q p , there are four solutions of (12.12) of
form (b), namely (±γ , ±γ ), where ±γ are the two solutions of 2z 2 = k, noting that 2 ∈ Q p
by Lemma 6.2.5. So if k ∈ Q p , 8r − 8 solutions of (12.12) have form (c). In case (c) with
k ∈ Q p or k ∈ N p , any set of eight solutions of (12.12) in the group (±x, ±y) and (±y, ±x)
lead to the same solution of i + j = k counted by NQ p (k). Therefore, NQ p (k) = r − 1 if
k ∈ Q p and NQ p (k) = r if k ∈ N p .
Let α ∈ N p . Then i + j = k if and only if iα + jα = kα. Hence NN p (k) = NQ p (kα).
Therefore, NN p (k) = r − 1 if k ∈ N p and NN p (k) = r if k ∈ Q p .
Let Q(x) = i∈Q p x i and N (x) = i∈N p x i . Note that 1, Q(x), and N (x) are idempotents in R p = F2 [x]/(x p + 1). As discovered in [279, 289], a combination of these will be
idempotents in R p that lead to the definition of Z4 -quadratic residue codes. A multiple of the
p−1
p−1
all-one vector is also an idempotent; let j(x) = p i=0 x i . In particular, j(x) = 3 i=0 x i
p−1
if p ≡ −1 (mod 8) and j(x) = i=0 x i if p ≡ 1 (mod 8).
Lemma 12.4.2 Define r by p = 8r ± 1. If r is odd, then Q(x) + 2N (x), N (x) + 2Q(x),
1 − Q(x) + 2N (x), and 1 − N (x) + 2Q(x) are idempotents in R p . If r is even, then
−Q(x), −N (x), 1 + Q(x), and 1 + N (x) are idempotents in R p . Also, for r even or
odd, j(x) is an idempotent in R p .
490
Codes over Z4
Proof: We prove only the case when r is odd and leave the case when r is even to
Exercise 728. Working in R p , by Lemma 12.4.1,
Q(x)2 =
xi
i∈Q p
2
=
i∈Q p
x 2i +
x i+ j
i= j, i, j∈Q p
= Q(x) + 2[(r − 1)Q(x) + r N (x)] = Q(x) + 2N (x)
since r is odd and 2 ∈ Q p by Lemma 6.2.5. Similarly,
N (x)2 =
i∈N p
xi
2
=
i∈N p
x 2i +
x i+ j
i= j, i, j∈N p
= N (x) + 2[(r − 1)N (x) + r Q(x)] = N (x) + 2Q(x).
So (Q(x) + 2N (x))2 = Q(x)2 = Q(x) + 2N (x), (N (x) + 2Q(x))2 = N (x)2 = N (x) + 2Q(x),
(1 − Q(x) + 2N (x))2 = (1 − Q(x))2 = 1 − 2Q(x) + Q(x)2 = 1 − Q(x) + 2N (x), and (1−
N (x) + 2Q(x))2 = (1 − N (x))2 = 1 − 2N (x) + N (x)2 = 1 − N (x) + 2Q(x).
p−1 p−1
p−1
Finally, j(x)2 = p 2 i=0 x i j=0 x j = p 2 p i=0 x i = j(x) since p 2 ≡ 1 (mod 8).
Exercise 728 Prove Lemma 12.4.2 when r is even.
We now define the Z4 -quadratic residue codes using the idempotents of Lemma 12.4.2.
The definitions depend upon the value of p modulo 8.
12.4.1 Z4 -quadratic residue codes: p ≡ −1 (mod 8)
We first look at the case where p ≡ −1 (mod 8). Let p + 1 = 8r . If r is odd, define D1 = Q(x) + 2N (x), D2 = N (x) + 2Q(x), C 1 = 1 − N (x) + 2Q(x), and C 2 =
1 − Q(x) + 2N (x). If r is even, define D1 = −Q(x), D2 = −N (x), C 1 = 1 + N (x),
and C 2 = 1 + Q(x). These codes are called Z4 -quadratic residue codes when p ≡
−1 (mod 8).
The next result, from [279, 289], shows why these codes are called quadratic residue
codes; compare this result with those in Chapter 6.
Theorem 12.4.3 Let p ≡ −1 (mod 8). The Z4 -quadratic residue codes satisfy the
following:
(i) Di µa = Di and C i µa = C i for a ∈ Q p ; D1 µa = D2 and C 1 µa = C 2 for a ∈ N p ; in
particular D1 and D2 are equivalent as are C 1 and C 2 .
(ii) D1 ∩ D2 = j(x) and D1 + D2 = R p .
(iii) C 1 ∩ C 2 = {0} and C 1 + C 2 = j(x)⊥ .
(iv) D1 and D2 have type 4( p+1)/2 ; C 1 and C 2 have type 4( p−1)/2 .
(v) Di = C i + j(x) for i = 1 and 2.
(vi) C 1 and C 2 are self-orthogonal and C i⊥ = Di for i = 1 and 2.
Proof: Suppose that p + 1 = 8r . We verify this result when r is odd and leave the proof
for r even as an exercise.
491
12.4 Quadratic residue codes over Z4
For (i), if a ∈ N p , then (Q(x) + 2N (x))µa = N (x) + 2Q(x). Theorem 4.3.13(i) (whose
proof is still valid for cyclic codes over Z4 ) implies that D1 µa = D2 . Similarly, if a ∈
Q p , then (Q(x) + 2N (x))µa = Q(x) + 2N (x) and (N (x) + 2Q(x))µa = N (x) + 2Q(x),
implying Di µa = Di . The parts of (i) involving C i are similar.
p−1
Since p ≡ −1 (mod 8), j(x) = 3 i=0 x i = 3 + 3Q(x) + 3N (x). Thus (Q(x) +
2N (x))(N (x) + 2Q(x)) = (Q(x) + 2N (x))( j(x) + 1 − (Q(x) + 2N (x))) = (Q(x) +
2N (x)) j(x) + Q(x) + 2N (x) − (Q(x) + 2N (x))2 = (Q(x) + 2N (x)) j(x) = 3(( p − 1)/2)×
p−1 i
p−1 i
i=0 x = (3/2)( p − 1) j(x) = (12r − 3) j(x) = j(x). By Lemma
i=0 x + 3( p − 1)
12.3.23, D1 ∩ D2 = j(x). By the same result, D1 + D2 has generating idempotent
Q(x) + 2N (x) + N (x) + 2Q(x) − (Q(x) + 2N (x))(N (x) + 2Q(x)) = 3Q(x) + 3N (x) −
j(x) = 1. Therefore D1 + D2 = R p , proving part (ii).
Using (Q(x) + 2N (x))(N (x) + 2Q(x)) = j(x), for (iii) we have (1 − N (x) + 2Q(x)) ×
(1 − Q(x) + 2N (x)) = 1 − N (x) + 2Q(x) − Q(x) + 2N (x) + (N (x) + 2Q(x))(Q(x) +
2N (x)) = 1 + N (x) + Q(x) + j(x) = 0 proving that C 1 ∩ C 2 = {0}. Since (1 − N (x) +
2Q(x))(1 − Q(x) + 2N (x)) = 0, C 1 + C 2 has generating idempotent 1 − N (x) + 2Q(x) +
1 − Q(x) + 2N (x) = 2 + N (x) + Q(x) = 1 − j(x) = 1 − j(x)µ−1 as j(x)µ−1 = j(x).
By Lemma 12.3.23, C 1 + C 2 = j(x)⊥ , completing (iii).
For (iv), we observe that |D1 + D2 | = |D1 ||D2 |/|D1 ∩ D2 |. By (i), |D1 | = |D2 |, and by
(ii), |D1 + D2 | = 4 p and |D1 ∩ D2 | = 4. Thus D1 and D2 have size 4( p+1)/2 ; this is also
their type as each has an idempotent generator (as opposed to a generator that is the sum
of an idempotent and twice an idempotent; see Theorem 12.3.22 and Exercise 724). The
remainder of (iv) is similar using (i) and (iii).
By (ii), j(x) ∈ D2 implying that (N (x) + 2Q(x)) j(x) = j(x) as N (x) + 2Q(x) is the
multiplicative identity of D2 . By Lemma 12.3.23, the generating idempotent for C 1 + j(x)
is 1 − N (x) + 2Q(x) + j(x) − (1 − N (x) + 2Q(x)) j(x) = 1 − N (x) + 2Q(x) + j(x) −
( j(x) − j(x)) = Q(x) + 2N (x), proving that C 1 + j(x) = D1 . Similarly, C 2 + j(x) =
D2 , completing (v).
Finally, by Lemma 12.3.23, the generating idempotent for C ⊥
1 is 1 − (1 − N (x) +
2Q(x))µ−1 = N (x)µ−1 + 2Q(x)µ−1 . By Lemma 6.2.4, −1 ∈ N p as p ≡ −1 (mod 8).
Hence N (x)µ−1 = Q(x) and Q(x)µ−1 = N (x). Therefore the generating idempotent for
⊥
⊥
C⊥
1 is Q(x) + 2N (x), verifying that C 1 = D 1 . Similarly, C 2 = D 2 . This also implies that
C i is self-orthogonal as C i ⊆ Di from (v), completing (vi).
Exercise 729 Prove Theorem 12.4.3 when r is even.
Example 12.4.4 Four of the cyclic codes of length 7 over Z4 given in Table 12.1 are
quadratic residue codes. In this case p = 7 and r = 1. In the notation of that table
e2 (x) = 1 − Q(x) + 2N (x), and
e3 (x) = 1 − N (x) +
(see Example 12.3.26),
e1 (x) = j(x),
2Q(x). Therefore D1 is code No. 4, D2 is code No. 5, C 1 is code No. 2, and C 2 is code
No. 3. Notice that the duality conditions given in Table 12.1 for these codes agree with
Theorem 12.4.3(vi).
Exercise 730 Do the following:
(a) Find the generating idempotents for the Z4 -quadratic residue codes of length 23.
(b) Apply the reduction homomorphism µ to the idempotents found in part (a).
(c) What are the binary codes generated by the idempotents found in part (b)?
492
Codes over Z4
12.4.2 Z4 -quadratic residue codes: p ≡ 1 (mod 8)
When p ≡ 1 (mod 8), we simply reverse the C i s and Di s from the codes defined in the
case p ≡ −1 (mod 8). Again let p − 1 = 8r . If r is odd, define D1 = 1 − N (x) + 2Q(x),
D2 = 1 − Q(x) + 2N (x), C 1 = Q(x) + 2N (x), and C 2 = N (x) + 2Q(x). If r is even,
define D1 = 1 + N (x), D2 = 1 + Q(x), C 1 = −Q(x), and C 2 = −N (x). These
codes are called Z4 -quadratic residue codes when p ≡ 1 (mod 8). We leave the proof
of the next result as an exercise. Again see [279, 289].
Theorem 12.4.5 Let p ≡ 1 (mod 8). The Z4 -quadratic residue codes satisfy the
following:
(i) Di µa = Di and C i µa = C i for a ∈ Q p ; D1 µa = D2 and C 1 µa = C 2 for a ∈ N p ; in
particular D1 and D2 are equivalent as are C 1 and C 2 .
(ii) D1 ∩ D2 = j(x) and D1 + D2 = R p .
(iii) C 1 ∩ C 2 = {0} and C 1 + C 2 = j(x)⊥ .
(iv) D1 and D2 have type 4( p+1)/2 ; C 1 and C 2 have type 4( p−1)/2 .
(v) Di = C i + j(x) for i = 1 and 2.
⊥
(vi) C ⊥
1 = D 2 and C 2 = D 1 .
Exercise 731 Prove Theorem 12.4.5.
Exercise 732 Do the following:
(a) Find the generating idempotents for the Z4 -quadratic residue codes of length 17.
(b) Apply the reduction homomorphism µ to the idempotents found in part (a).
(c) What are the binary codes generated by the idempotents found in part (b)?
12.4.3 Extending Z4 -quadratic residue codes
In Section 6.6.3, we described the extensions of quadratic residue codes over Fq . We do the
same for these codes over Z4 . Let D1 and D2 be the quadratic residue codes of length p ≡
±1 (mod 8) as described in the two previous subsections. We will define two extensions
i = {c∞ c0 · · · c p−1 | c0 · · · c p−1 ∈ Di , c∞ + c0 + · · · + c p−1 ≡ 0 (mod 4)}
of Di . Define D
i
and Di = {c∞ c0 · · · c p−1 | c0 · · · c p−1 ∈ Di , −c∞ + c0 + · · · + c p−1 ≡ 0 (mod 4)}. D
and Di are the extended Z4 -quadratic residue codes of length p + 1. Note that Di and Di are
equivalent under the map that multiplies the extended coordinate by 3. If ci is a codeword
i and D
i , respectively.
in Di , let
ci and
ci be the extended codewords in D
Exercise 733 Do the following:
(a) Let C i be the quadratic residue codes of length p ≡ ±1 (mod 8) as described in the two
previous subsections. Prove that the sum of the components of any codeword in C i is 0
modulo 4. Hint: First show this for the generating idempotents and hence for the cyclic
shifts of the idempotents.
(b) Prove that the sum of the components of j(x) is 1 modulo 4.
(c) Prove that the Euclidean weight of the generating idempotents of C i is a multiple of 8
if p ≡ −1 (mod 8).
(d) Prove that wt E (j(x)) = wt E (j(x)) ≡ 0 (mod 8) if p ≡ −1 (mod 8).
493
12.4 Quadratic residue codes over Z4
Using Exercise 733 along with Theorems 12.4.3(v) and 12.4.5(v) we can find the generi .
i and D
ator matrices for D
Theorem 12.4.6 Let G i be the generator matrix for the Z4 -quadratic residue code C i . Then
i and D
i , respectively, are:
i for D
i and G
generator matrices G
(i) If p ≡ −1 (mod 8), then
1 3 ··· 3
3 3 ··· 3
0
0
i =
i =
G
.
.
.
and G
.
..
.
G
G
i
i
0
0
(ii) If p ≡ 1 (mod 8), then
3 1 ··· 1
0
i =
G
.
and
.
.
Gi
0
1
0
i =
G
.
..
0
1 ··· 1
Gi
.
Exercise 734 Prove Theorem 12.4.6.
The extended codes have the following properties; compare this result to Theorem 6.6.14.
Theorem 12.4.7 Let Di be the Z4 -quadratic residue codes of length p. The following hold:
i are self-dual. Furthermore, all codewords of D
i
i and D
(i) If p ≡ −1 (mod 8), then D
i have Euclidean weights a multiple of 8.
and D
1 .
2 and D
⊥ = D
⊥ = D
(ii) If p ≡ 1 (mod 8), then D
1
2
Proof: Let p ≡ −1 (mod 8). Then by Theorem 12.4.3(vi), C i⊥ = Di . Hence as the extended
coordinate of any vector in C i is 0 by Exercise 733, the extended codewords arising from C i
i or D
i . Since the inner product of j(x) with itself
are orthogonal to all codewords in either D
i and D
i are
(and j(x) with itself) is 32 ( p + 1) ≡ 0 (mod 4) (and 12 + 32 p ≡ 0 (mod 4)), D
( p+1)/2
self-orthogonal using Theorem 12.4.3(v). By Theorem 12.4.3(iv), Di and Di have 4
i and D
i are self-dual. To verify that the Euclidean weight of
codewords implying that D
i and D
i is a multiple of 8, we only have to verify this for the rows of a
a codeword in D
i and D
i by Theorem 12.2.4. The rows of the generator matrix for D
i
generator matrix of D
and Di are the extensions of j(x) and cyclic shifts of the generating idempotent of C i ; see
i have Euclidean weight
i and D
Theorem 12.4.6. But the rows of the generator matrix for D
a multiple of 8 from Exercise 733.
⊥
Suppose now that p ≡ 1 (mod 8). By Theorem 12.4.5(vi), C ⊥
1 = D 2 and C 2 = D 1 . Hence
as the extended coordinate of any vector in C i is 0 by Exercise 733, the extended codewords
j or D
j where j = i. Since the
arising from C i are orthogonal to all codewords in either D
⊥
inner product of j(x) with j(x) is 3 + p ≡ 0 (mod 4), D j ⊆ Di where j = i. Part (ii) now
follows from Theorem 12.4.5(iv).
1 and D
2
Example 12.4.8 By Theorem 12.4.7, the extended quadratic residue codes D
of length p + 1 = 24 are self-dual codes where all codewords have Euclidean weights a
494
Codes over Z4
multiple of 8. If C is either of these codes, the symmetrized weight enumerator (12.7) of C,
computed in [27], is:
sweC (a, b, c) = a 24 + c24 + 759(a 8 c16 + a 16 c8 ) + 2576a 12 c12
+ 12 144(a 2 b8 c14 + a 14 b8 c2 ) + 170 016(a 4 b8 c12 + a 12 b8 c4 )
+ 765 072(a 6 b8 c10 + a 10 b8 c6 ) + 1 214 400a 8 b8 c8
+ 61 824(ab12 c11 + a 11 b12 c) + 1 133 440(a 3 b12 c9 + a 9 b12 c3 )
+ 4 080 384(a 5 b12 c7 + a 7 b12 c5 ) + 24 288(b16 c8 + a 8 b16 )
+ 680 064(a 2 b16 c6 + a 6 b16 c2 ) + 1 700 160a 4 b16 c4 + 4096b24 .
Thus we see that C has minimum Hamming weight 8, minimum Lee weight 12, and
minimum Euclidean weight 16. The Gray image G(C) is a nonlinear (48, 224 , 12) binary
code. Recall that the extended binary quadratic residue code of length 48 is a [48, 24, 12]
self-dual doubly-even code as discussed in Example 6.6.23; see also the related Research
Problem 9.3.7.
Exercise 735 Let C be one of the extended quadratic residue codes of length 24 presented
in Example 12.4.8.
(a) By examining sweC (a, b, c), verify that C has minimum Hamming weight 8, minimum
Lee weight 12, and minimum Euclidean weight 16.
(b) Give the Hamming weight distribution of G(C) from sweC (a, b, c).
(c) There is a symmetry in the entries in sweC (a, b, c); for instance, the number of codewords
in C contributing to a 6 b8 c10 is the same as the number of codewords contributing to
a 10 b8 c6 . The presence of what codeword in C explains this symmetry?
Exercise 736 Do the following:
1 when p = 7 is
(a) Show that a generator matrix for D
3 3 3 3 3 3 3 3
0 1 2 2 3 2 3 3
0 3 1 2 2 3 2 3 .
0
3
3
1
2
2
3
2
(b) Row reduce the matrix found in (a). How is this matrix related to the generator matrix
for o8 found in Example 12.2.5?
The automorphism groups of the extended Z4 -quadratic residue codes of length p + 1
possess some of the same automorphisms as the ordinary quadratic residue codes. Denote
the coordinates of the extended Z4 -quadratic residue codes by {∞, F p }. Then MAut(Dext ),
i or D
i , contains the translation automorphisms Tg for g ∈ F p given
where Dext is either D
by i Tg ≡ i + g (mod p) for all g ∈ F p , the multiplier automorphisms µa for a ∈ Q p , and
an automorphism satisfying the Gleason–Prange Theorem of Section 6.6.4; see [27, 289].
In particular, MAut(Dext ) is transitive. Call a vector with components in Z4 even-like if the
sum of its components is 0 modulo 4 and odd-like otherwise. As in Theorem 1.7.13, the
minimum Lee weight codewords in a Z4 -quadratic residue Di are all odd-like.
495
12.5 Self-dual codes over Z4
12.5
Self-dual codes over Z4
We have noticed that codes such as the octacode and some of the extended Z4 -quadratic
residue codes are self-dual. In this section we will study this family of codes. Again much
of the study of self-dual codes over Z4 parallels that of self-dual codes over Fq . We have
already observed one important difference; namely, there are self-dual codes of odd length
over Z4 . For example, in Table 12.1 we found three self-dual cyclic codes of length 7.
By Theorem 12.1.5, the Euclidean weight of every codeword in a self-orthogonal code is
a multiple of 4. By Theorem 12.2.4, if the Euclidean weight of every codeword is a multiple
of 8, the code is self-orthogonal. This leads us to define Type I and II codes over Z4 . A selfdual Z4 -linear code is Type II if the Euclidean weight of every codeword is a multiple of 8.
We will see later that Type II codes exist only for lengths n ≡ 0 (mod 8). These codes also
contain a codeword with all coordinates ±1. For example, by Theorem 12.4.7, the extended
Z4 -quadratic residue codes of length p + 1 with p ≡ −1 (mod 8) are Type II. A self-dual
Z4 -linear code is Type I if the Euclidean weight of some codeword is not a multiple of 8.
There are Gleason polynomials for self-dual codes over Z4 analogous to those that arise
for self-dual codes over F2 , F3 , and F4 presented in Gleason’s Theorem of Section 9.2.
These polynomials can be found for the Hamming, symmetrized, and complete weight
enumerators; see [291] for these polynomials.
There is also an upper bound on the Euclidean weight of a Type I or Type II code over
Z4 . The proof of the following can be found in [26, 292]; compare this to Theorems 9.3.1,
9.3.5, and 9.4.14.
Theorem 12.5.1 Let C be a self-dual code over Z4 of length n. The following hold:
(i) If C is Type II, then the minimum Euclidean weight of C is at most 8 ⌊n/24⌋ + 8.
(ii) If C is Type I, then the minimum Euclidean weight of C is at most 8 ⌊n/24⌋ + 8 except
when n ≡ 23 (mod 24), in which case the bound is 8 ⌊n/24⌋ + 12. If equality holds in
this latter bound, then C is obtained by shortening3 a Type II code of length n + 1.
Codes meeting these bounds are called Euclidean-extremal.
Example 12.5.2 By Example 12.4.8 and Exercise 735, the extended Z4 -quadratic residue
codes of length 24 are Type II with minimum Euclidean weight 16. These are Euclideanextremal codes.
The bounds of Theorem 12.5.1 are obviously bounds on the minimum Lee weight of selfdual codes, but highly unsatisfactory ones. No good bound on the minimum Lee weight is
currently known.
3
Shortening a Z4 -linear code on a given coordinate is done as follows. If there are codewords that have every
value in Z4 in the given coordinate position, choose those codewords with only 0 or 2 in that coordinate position
and delete that coordinate to produce the codewords of the shortened code. If all codewords have only 0 or 2 in
the given coordinate position (and some codeword has 2 in that position), choose those codewords with only 0
in that coordinate position and delete that coordinate to produce the codewords of the shortened code. In each
case, the shortened code is linear with half as many codewords as the original. Furthermore, if the original code
is self-dual, so is the shortened code.
496
Codes over Z4
Research Problem 12.5.3 Give an improved upper bound on the minimum Lee weight of
self-dual codes over Z4 .
Exercise 737 By Theorem 12.1.5, the Lee weight of every codeword in a self-orthogonal
Z4 -linear code is even. It is natural to ask what happens in a code over Z4 in which all
codewords have Lee weight a multiple of 4.
(a) Give an example of a vector in Zn4 of Lee weight 4 that is not orthogonal to itself.
(b) Unlike what occurs with codes where all Euclidean weights are multiples of 8, part (a)
indicates that a code over Z4 in which all codewords have Lee weight a multiple of 4
is not necessarily self-orthogonal. However, if C is a self-dual Z4 -linear code where all
codewords have Lee weight a multiple of 4, then G(C) is linear. Prove this. Hint: Let
u, v ∈ C. By Theorem 12.2.2(i), wt L (u − v) = d L (u, v) = d(G(u), G(v)) = wt(G(u) +
G(v)). As C is linear, u − v ∈ C. Also wt L (u) = wt(G(u)) and wt L (v) = wt(G(v)). Use
these facts and wt(G(u) + G(v)) = wt(G(u)) + wt(G(v)) − 2wt(G(u) ∩ G(v)) together
with our assumption to show G(u) and G(v) are orthogonal. Assessing the size of G(C),
show that the self-orthogonal binary linear code spanned by G(C) cannot be larger than
G(C).
Let C be a Z4 -linear code of length n. There are two binary linear codes of length n
associated with C. The residue code Res(C) is µ(C). The torsion code Tor(C) is {b ∈ Fn2 |
2b ∈ C}. So the vectors in Tor(C) are obtained from the vectors in C with all components
0 or 2 by dividing these components in half. If C has generator matrix G in standard form
(12.1), then Res(C) and Tor(C) have generator matrices
,
and
(12.13)
G Res = Ik1 A B1
G Tor =
Ik1
O
A
Ik2
B1
.
C
(12.14)
So Res(C) ⊆ Tor(C). If C is self-dual, we have the following additional relationship between
Res(C) and Tor(C).
Theorem 12.5.4 If C is a self-dual Z4 -linear code, then Res(C) is doubly-even and
Res(C) = Tor(C)⊥ .
Proof: Suppose C has generator matrix G given by (12.1). Denote one of the first k1
rows of G by r = (i, a, b1 + 2b2 ), where i, a, b1 , and b2 are rows of Ik1 , A, B1 , and
B2 , respectively. Then 0 ≡ r · r ≡ i · i + a · a + b1 · b1 (mod 4), implying that the rows of
(12.13) are doubly-even and so are orthogonal to themselves. If r′ = (i′ , a′ , b′1 + 2b′2 ) is
another of the first k1 rows of G, then
r · r′ ≡ i · i′ + a · a′ + b1 · b′1 + 2(b2 · b′1 + b1 · b′2 ) (mod 4).
(12.15)
If s = (0, 2i′ , 2c) is one of the bottom k2 rows of G, using analogous notation, then
r · s ≡ 2a · i′ + 2b1 · c (mod 4).
(12.16)
As r · r′ ≡ r · s ≡ 0 (mod 4), (12.15) and (12.16) imply that the rows of (12.13) are orthogonal as binary vectors to the rows of (12.14). Thus Res(C) ⊆ Tor(C)⊥ ; as C is self-dual,
497
12.5 Self-dual codes over Z4
2k1 + k2 = n, implying Res(C) = Tor(C)⊥ . In particular, Res(C) is self-orthogonal and as
it has a generator matrix with doubly-even rows, Res(C) is doubly-even.
One consequence of this result is that a Type II code must contain a codeword with all
entries ±1. This implies that Type II codes can exist only for lengths a multiple of 8. The
simple proof we give of this was originally due to Gaborit.
Corollary 12.5.5 Let C be a Type II code of length n. Then Tor(C) is an even binary code,
Res(C) contains the all-one binary vector, C contains a codeword with all entries ±1, and
n ≡ 0 (mod 8).
Proof: Let c ∈ Tor(C). Then 2c ∈ C; as wt E (2c) ≡ 0 (mod 8), c is a binary vector of even
weight. Hence Tor(C) is an even binary code. So Res(C) contains the all-one binary vector
1 as Res(C) = Tor(C)⊥ by Theorem 12.5.4. Any vector v ∈ C with µ(v) = 1 has no entries
0 or 2; at least one such v exists. As 0 ≡ wt E (v) ≡ n (mod 8), the proof is complete.
If we start with an arbitrary doubly-even binary code, can we form a self-dual Z4 -linear
code with this binary code as its residue code? This is indeed possible, a fact we explore in
the next section; but in order to do so, we need another form for the generator matrix.
Corollary 12.5.6 Let C be a self-dual Z4 -linear code with generator matrix in standard
form. Then C has a generator matrix of the form
G′ =
F
2H
Ik + 2B
,
O
where B, F, and H are binary matrices. Furthermore, generator matrices for Res(C) and
Tor(C) are
G ′Res = [F
Ik ] and
G ′Tor =
F
H
Ik
,
O
respectively.
Proof: Let k = k1 where C has type 4k1 2k2 . As C is self-dual, k2 = n − 2k. We first show
that C has a generator matrix of the form
G ′′ =
D
O
E
2In−2k
Ik + 2B
,
2C
where B, C, D, and E are binary matrices. To verify this, we only need to show that we can
replace the first k = k1 rows of G from (12.1) by [D E Ik + 2B]. By Theorems 12.5.4 and
1.6.2, the right-most k coordinates of Res(C) are information positions, as the left-most n − k
coordinates of Tor(C) are information positions. Thus by (12.13), B1 has binary rank k and so
has a binary inverse D. Hence the first k rows of G can be replaced by D[Ik A B1 + 2B2 ] =
[D E + 2E 1 Ik + 2B3 ], where D A = E + 2E 1 and D(B1 + 2B2 ) = Ik + 2B3 . Adding
E 1 [O 2In−2k 2C] to this gives G ′′ . Now add 2C[D E Ik + 2B] to the bottom n − 2k
rows of G ′′ to obtain G ′ . That Res(C) and Tor(C) have the stated generator matrices follows
from the definitions of the residue and torsion codes.
498
Codes over Z4
12.5.1 Mass formulas
We can use the form of the generator matrix for a self-dual Z4 -linear code given in
Corollary 12.5.6 to count the total number of such codes. Let C be any self-dual Z4 -linear
code of type 4k 2n−2k with 0 ≤ k ≤ ⌊n/2⌋. By Theorem 12.5.4, Res(C) is an [n, k] selforthogonal doubly-even binary code, and we can apply a permutation matrix P to C so that
the generator matrix G ′ of C P is given in Corollary 12.5.6. Conversely, if we begin with an
[n, k] self-orthogonal doubly-even binary code with generator matrix [F Ik ] that generates
Res(C P), we must be able to find the binary matrix B to produce the first k rows of G ′ . While
the submatrix H in G ′ is not unique, its span is uniquely determined as Res(C)⊥ = Tor(C).
This produces a generator matrix for C P and from there we can produce C. So to count the
total number of codes C with a given residual code, we only need to count the number of
choices for B.
Beginning with a generator matrix [F Ik ], which generates an [n, k] self-orthogonal
doubly-even binary code C 1 , choose H so that
F
H
Ik
O
k(k+1)/2
choices for B that yield self-dual
generates C 2 = C ⊥
1 . We now show that there are 2
Z4 -linear codes with generator matrices of the form G ′ given in Corollary 12.5.6. As the
inner product modulo 4 of vectors whose components are only 0s and 2s is always 0, the
inner product of two of the bottom n − 2k rows of G ′ is 0. Regardless of our choice of B,
the inner product modulo 4 of one of the top k rows of G ′ with one of the bottom n − 2k
rows of G ′ is also always 0 as C ⊥
1 = C 2 . Furthermore, the inner product modulo 4 of one of
′
the top k rows of G with itself is 0 regardless of the choice of B because C 1 is doubly-even.
Let B = [bi, j ] with 1 ≤ i ≤ k, 1 ≤ j ≤ k. Choose the entries of B on or above the diagonal
(that is, bi, j with 1 ≤ i ≤ j ≤ k) arbitrarily; notice that we are freely choosing k(k + 1)/2
entries. We only need to make sure that the inner product of row i and row j of G ′ , with
1 ≤ i < j ≤ k, is 0 modulo 4. But this inner product modulo 4 is
fi · f j + 2(bi, j + b j,i ),
(12.17)
where fi and f j are rows i and j of F. As C 1 is self-orthogonal, fi · f j ≡ 0 (mod 2) implying
that we can solve (12.17) for the binary value b j,i uniquely so that the desired inner product
is 0 modulo 4. This proves the following result, first shown in [92].
Theorem 12.5.7 For 0 ≤ k ≤ ⌊n/2⌋, there are νn,k 2k(k+1)/2 self-dual codes over Z4 of
length n and type 4k 2n−2k , where νn,k is the number of [n, k] self-orthogonal doubly-even
binary codes. The total number of self-dual codes over Z4 of length n is
⌊ n2 ⌋
k=0
νn,k 2k(k+1)/2 .
499
12.5 Self-dual codes over Z4
Exercise 738 Do the following:
(a) Fill in each entry ⋆ with 0 or 2 so that the resulting generator matrix yields a self-dual
code:
0 1 1 1 3 2 0 2
1 0 1 1 ⋆ 1 2 0
G′ =
1 1 0 1 ⋆ ⋆ 3 0 .
1
1
1
0
⋆
⋆
⋆
1
(b) Fill in a 4 × 6 matrix H with entries 0 or 1 so that
I2 + 2B
O
F
2H
generates a self-dual code over Z4 of length 8 and type 42 24 , where
I2 + 2B] =
[F
1 1 1 0 0 0
0 1 1 1 0 0
3
0
2
.
1
A similar argument allows us to count the number of Type II codes. Note that by Corollary 12.5.5, if C is Type II of length n, then n ≡ 0 (mod 8) and Res(C) contains the all-one
vector. For a proof of the following, see [92, 276] or Exercise 739.
Theorem 12.5.8 If n ≡ 0 (mod 8), there are µn,k 21+k(k−1)/2 Type II codes over Z4 of length
n and type 4k 2n−2k for 0 ≤ k ≤ n/2, where µn,k is the number of [n, k] self-orthogonal
doubly-even binary codes containing 1. The total number of Type II codes over Z4 of length
n is
n
2
µn,k 21+k(k−1)/2 .
k=0
Note that in the proof of Theorem 9.12.5, we gave a recurrence relation for µn,k ; namely
µn,1 = 1 and
µn,k+1 =
for k ≥ 1.
2n−2k−1 + 2n/2−k−1 − 1
µn,k
2k − 1
Exercise 739 Suppose F is a k × (n − k) binary matrix such that the binary code R of
length n ≡ 0 (mod 8) generated by [F Ik ] is doubly-even and contains the all-one vector.
Suppose H is an (n − 2k) × (n − k) binary matrix such that
F
H
Ik
O
generates the binary dual of R. By doing the following, show that there are exactly 21+k(k−1)/2
k × k binary matrices B = [bi, j ] such that
G′ =
F
2H
Ik + 2B
O
500
Codes over Z4
generates a Type II Z4 -linear code. This will prove the main part of Theorem 12.5.8. Choose
the entries of B in rows 2 ≤ i ≤ k on or above the main diagonal arbitrarily. Also choose
the entry b1,1 arbitrarily. Thus 1 + k(k − 1)/2 entries have been chosen arbitrarily. This
exercise will show that a fixed choice for these entries determines all other entries uniquely
so that G ′ generates a Type II Z4 -linear code.
(a) Show that every row of H has an even number of ones and that the rows of [2H O]
have Euclidean weight a multiple of 8.
(b) Show that any row of [2H O] is orthogonal to any row of G ′ .
(c) Let 2 ≤ i ≤ k. Show that if all the entries in row i of B except bi,1 are fixed, we can
uniquely choose bi,1 so that the Euclidean weight of row i of G ′ is a multiple of 8. Hint:
First show that if all the entries of row i of B including bi,1 are fixed, the Euclidean
weight of row i of G ′ is either 0 or 4 modulo 8.
(d) The entries b1,1 and bi, j for 2 ≤ i ≤ j ≤ k have been fixed, and part (c) shows that
bi,1 is determined for 2 ≤ i ≤ k. Show that the remaining entries of B are uniquely
determined by the requirement that the rows of G ′ are to be orthogonal to each other.
Hint: Use (12.17) noting that fi · f j ≡ 0 (mod 2).
(e) We now show that G ′ generates a Type II Z4 -linear code. At this point we know that
G ′ generates a self-dual Z4 -linear code where all rows, except possibly the first, have
Euclidean weight a multiple of 8. Let G ′′ be a matrix identical to G ′ except with a
different first row. Let the first row of G ′′ be the sum of rows 1, 2, . . . , k of G ′ . Note
that G ′ and G ′′ generate the same code and that the code is Type II if we show the first
row of G ′′ has Euclidean weight a multiple of 8. Show that this is indeed true. Hint:
The first row of G ′′ modulo 2 must be in the binary code R, which contains the all-one
vector. Modulo 2 what is this row? Show that this row has Euclidean weight n ≡ 0
(mod 8).
(f ) Let
0 1 1 1 3 ⋆ ⋆ ⋆
1 0 1 1 ⋆ 1 2 0
G′ =
1 1 0 1 ⋆ ⋆ 3 0 .
1 1 1 0 ⋆ ⋆ ⋆ 1
Fill in each entry ⋆ with 0 or 2 so that G ′ generates a Type II code.
In [59, 85], the self-dual Z4 -linear codes of length n with 1 ≤ n ≤ 15 are completely
classified; in addition, the Type II codes of length 16 are classified in [276]. The mass
formula together with the size of the automorphism groups of the codes is used, as was
done in the classification of self-dual codes over Fq , to insure that the classification is
complete.
The heart of the classification is to compute the indecomposable codes. Recall that a code
is decomposable if it is equivalent to a direct sum of two or more nonzero subcodes, called
components, and indecomposable otherwise. If a self-dual code is decomposable, each of
its component codes must be self-dual. Furthermore, the components of a decomposable
Type II code must also be Type II codes. Recall that by Exercise 685, the only self-dual
code over Z4 of length 1 is the code we denote A1 with generator matrix [ 2 ]. In that
501
12.5 Self-dual codes over Z4
Table 12.2 Indecomposable self-dual codes over Z4 of length
1 ≤ n ≤ 16
n
Type I
Type II
1
2
3
4
5
6
1
0
0
1
0
1
0
0
0
0
0
0
n
7
8
9
10
11
12
Type I
Type II
1
2
0
4
2
31
0
4
0
0
0
0
n
13
14
15
16
Type I
Type II
7
92
111
?
0
0
0
123
exercise you deduced that, up to equivalence, the only self-dual codes of lengths 2 and 3
are 2A1 and 3A1 , respectively, where 2A1 is the direct sum of two copies of A1 . You also
discovered that, up to equivalence, there are two inequivalent self-dual codes of length 4:
the decomposable code 4A1 and the indecomposable code D4 with generator matrix
1
0
0
1
2
0
1
0
2
1
2 .
2
Table 12.2 contains the number of inequivalent indecomposable Type I and Type II codes
of length n with 1 ≤ n ≤ 16; this data is taken from [59, 85, 276]. While the real work in
the classification is to find the indecomposable codes, using the mass formula requires a
list of both the indecomposable and the decomposable codes.
Exercise 740 Let C be a self-orthogonal code of length n over Z4 . Suppose that C contains
a codeword with one component equal to 2 and the rest equal to 0. Prove that C is equivalent
to a direct sum of A1 and a self-dual code of length n − 1.
Exercise 741 Let C be equivalent to a direct sum C 1 ⊕ C 2 of the indecomposable codes C 1
and C 2 . Also let C be equivalent to a direct sum D1 ⊕ D2 of the indecomposable codes D1
and D2 . Prove that C 1 is equivalent to either D1 or D2 , while C 2 is equivalent to the other
Di .
Exercise 742 Construct one indecomposable self-dual code of length 6 over Z4 . Also construct one indecomposable self-dual code of length 7. Hint: For length 7, check Table 12.1.
Table 12.3 presents the total number of inequivalent Type I and Type II codes of length n
with 1 ≤ n ≤ 16. The Type II codes represented in this table all have minimum Euclidean
weight 8 and hence are Euclidean-extremal by Theorem 12.5.1.
Example 12.5.9 From Table 12.2, there are four indecomposable Type II codes of length
8 and two indecomposable Type I codes. The decomposable self-dual codes of length 8 are
8A1 , 4A1 ⊕ D4 , 2A1 ⊕ E 6 , A1 ⊕ E 7 , and 2D4 , where E 6 and E 7 are the indecomposable
codes of lengths 6 and 7, respectively, constructed in Exercise 742. All the decomposable
502
Codes over Z4
Table 12.3 Self-dual codes over Z4 of length 1 ≤ n ≤ 16
n
Type I
Type II
1
2
3
4
5
6
1
1
1
2
2
3
0
0
0
0
0
0
n
7
8
9
10
11
12
Type I
Type II
4
7
11
16
19
58
0
4
0
0
0
0
n
13
14
15
16
Type I
Type II
66
170
290
?
0
0
0
133
codes are of Type I. The five decomposable codes are also all inequivalent to each other by
a generalization of Exercise 741. This verifies the entries for length 8 in Table 12.3.
Exercise 743 Assuming the numbers in Table 12.2, verify that the numbers in Table 12.3
are correct for lengths 10, 11, 12, and 16.
12.5.2 Self-dual cyclic codes
We have observed numerous times that self-dual cyclic codes over Z4 exist. Theorem 12.3.13
gives a pair of generating polynomials for cyclic codes. The next theorem, found in [283],
gives conditions on these polynomials that lead to self-dual codes.
Theorem 12.5.10 Let C = f (x)g(x) ⊕ 2 f (x)h(x) be a cyclic code over Z4 of odd
length n, where f (x), g(x), and h(x) are monic polynomials such that x n − 1 =
f (x)g(x)h(x). Then C is self-dual if and only if f (x) = h ∗ (x) and g(x) = g ∗ (x).
Proof: We first remark that the constant term of any irreducible factor of x n − 1 is not
equal to either 0 or 2. In particular, by definition, f ∗ (x), g ∗ (x), and h ∗ (x) are all monic and
f ∗ (x)g ∗ (x)h ∗ (x) = x n − 1.
Suppose that f (x) = h ∗ (x) and g(x) = g ∗ (x). By Theorem 12.3.20, C ⊥ =
∗
h (x)g ∗ (x) ⊕ 2h ∗ (x) f ∗ (x) = f (x)g(x) ⊕ 2 f (x)h(x) = C and C is self-dual.
Now assume that C is self-dual. Since C = f (x)g(x) ⊕ 2 f (x)h(x) and C ⊥ =
∗
h (x)g ∗ (x) ⊕ 2h ∗ (x) f ∗ (x) and these decompositions are unique, we have
f (x)g(x) = h ∗ (x)g ∗ (x),
∗
and
∗
f (x)h(x) = h (x) f (x).
(12.18)
(12.19)
From (12.18), x n − 1 = f (x)g(x)h(x) = h ∗ (x)g ∗ (x)h(x). As f ∗ (x)g ∗ (x)h ∗ (x) = x n − 1,
we have h ∗ (x)g ∗ (x)h(x) = h ∗ (x)g ∗ (x) f ∗ (x) = x n − 1. By the unique factorization of x n −
1 into monic irreducible polynomials, f ∗ (x) = h(x). Similarly, using (12.19), x n − 1 =
f (x)h(x)g(x) = h ∗ (x) f ∗ (x)g(x) = h ∗ (x) f ∗ (x)g ∗ (x), implying that g(x) = g ∗ (x).
Example 12.5.11 In Exercise 714, we found that the factorization of x 15 − 1 over Z4 is
given by
x 15 − 1 = g1 (x)g2 (x)g3 (x)g4 (x)g4∗ (x),
503
12.5 Self-dual codes over Z4
where g1 (x) = x − 1, g2 (x) = x 4 + x 3 + x 2 + x + 1, g3 (x) = x 2 + x + 1, and g4 (x) =
x 4 + 2x 2 + 3x + 1. Let C = f (x)g(x) ⊕ 2 f (x)h(x) be a self-dual cyclic code of length
15. By Theorem 12.5.10, f ∗ (x) = h(x) and g ∗ (x) = g(x). This implies that if g(x) contains
a given factor, it must also contain the reciprocal polynomial of that factor. Also if an irreducible factor of x 15 − 1 is its own reciprocal polynomial, it must be a factor of g(x). Thus
there are only the following possibilities:
(a) f (x) = g4 (x), g(x) = g1 (x)g2 (x)g3 (x), and h(x) = g4∗ (x),
(b) f (x) = g4∗ (x), g(x) = g1 (x)g2 (x)g3 (x), and h(x) = g4 (x), and
(c) f (x) = 1, g(x) = x 15 − 1, and h(x) = 1.
The codes in (a) and (b) are of type 44 27 ; they are equivalent under the multiplier µ−1 . The
code in (c) is the trivial self-dual code with generator matrix 2I15 .
Exercise 744 Consider the cyclic code C of length 15 with generator polynomials given
by (a) in Example 12.5.11.
(a) Write down a 4 × 15 generator matrix for Res(C). Note that Res(C) is a cyclic code
with generator polynomial µ( f (x)g(x)).
(b) The generator matrix from part (a) is the parity check matrix for Tor(C) by
Theorem 12.5.4. What well-known code is Tor(C)? What well-known code is
Res(C)?
Exercise 745 The factorization of x 21 − 1 over Z4 is given by
x 21 − 1 = g1 (x)g2 (x)g3 (x)g3∗ (x)g4 (x)g4∗ (x),
where g1 (x) = x − 1, g2 (x) = x 2 + x + 1, g3 (x) = x 6 + 2x 5 + 3x 4 + 3x 2 + x + 1, and
g4 (x) = x 3 + 2x 2 + x + 3.
(a) As in Example 12.5.11, list all possible triples ( f (x), g(x), h(x)) that lead to self-dual
cyclic codes of length 21 over Z4 . Note: There are nine such codes including the trivial
self-dual code with generator matrix 2I21 .
(b) Give the types of each code found in (a).
12.5.3 Lattices from self-dual codes over Z4
In Section 10.6 we introduced the concept of a lattice and described how to construct
lattices from codes, particularly self-dual binary codes. One method to construct lattices
from binary codes described in that section is called Construction A. There is a Z4 -analogue
called Construction A4 , beginning with a Z4 -linear code C of length n. Construction A4
produces a lattice 4 (C), which is the set of all x in Rn obtained from a codeword in C by
viewing the codeword as an integer vector with 0s, 1s, 2s, and 3s, adding integer multiples
of 4 to any components, and dividing the resulting vector by 2. In particular,
4 (C) = {x ∈ Rn | 2x (mod 4) ∈ C}.
504
Codes over Z4
If the generator matrix G for C is written in standard form (12.1), then the generator matrix
M for 4 (C) is
I
A
B1 + 2B2
1 k1
(12.20)
M=
O 2Ik2
2C .
2
O
O
4In−k1 −k2
The following is the analogue of Theorem 10.6.4.
Theorem 12.5.12 Let C be a Z4 -linear code of length n and minimum Euclidean weight
d E . The following hold:
(i) If d E ≤ 16, the minimum norm µ of 4 (C) is µ = d E /4; if d E > 16, µ = 4.
(ii) det 4 (C) = 4n−2k1 −k2 .
(iii) 4 (C ⊥ ) = 4 (C)∗ .
(iv) 4 (C) is integral if and only if C is self-orthogonal.
(v) 4 (C) is Type I if and only if C is Type I.
(vi) 4 (C) is Type II if and only if C is Type II.
Proof: The proofs of (i) and (vi) are left as an exercise.
For part (ii), recall that det 4 (C) = det A, where A is the Gram matrix M M T . Using M
as in (12.20),
1 k2 n−k1 −k2 2
2
2 4
= 4n−2k1 −k2 ,
det 4 (C) = (det M) =
2n
verifying (ii).
The generator matrix G ⊥ for C ⊥ is given by (12.2). From this we see that the generator
matrix M ⊥ of 4 (C ⊥ ) is
−(B1 + 2B2 )T − C T AT C T In−k1 −k2
1
M⊥ =
2AT
2Ik2
O .
2
O
O
4Ik1
The inner product (in Z4 ) of a row of G from (12.1) with a row of G ⊥ from (12.2) is 0;
hence we see that the (real) inner product of a row of M and a row of M ⊥ is an integer. This
proves that M ⊥ M T is an integer matrix and 4 (C ⊥ ) ⊆ 4 (C)∗ . To complete (iii), we must
show 4 (C)∗ ⊆ 4 (C ⊥ ). Let y ∈ 4 (C)∗ . Then yM T ∈ Zn by Theorem 10.6.3(i). So there
exists z ∈ Zn with y = z(M T )−1 = z(M T )−1 (M ⊥ )−1 M ⊥ = z(M ⊥ M T )−1 M ⊥ . As M ⊥ M T
is an integer matrix and det(M ⊥ M T ) = det(M ⊥ ) det(M T ) = (±2−n · 4k1 · 2k2 ) · (2−n · 2k2 ·
4n−k1 −k2 ) = ±1,
(M ⊥ M T )−1 =
1
adj(M ⊥ M T )
det(M ⊥ M T )
is an integer matrix. Thus y = z′ M ⊥ for some z′ ∈ Zn . Hence y ∈ 4 (C ⊥ ), completing (iii).
The (real) inner product of two rows of M is always an integer if and only if the inner
product (in Z4 ) of two rows of G is 0, proving (iv).
Suppose 4 (C) is Type I; then det 4 (C) = ±1, implying n = 2k1 + k2 by (ii). As 4 (C)
is Type I, it is integral, and thus C ⊆ C ⊥ by (iv). Hence C is self-dual. The (real) inner product
505
Y
L
12.6 Galois rings
F
of some lattice point in 4 (C) with itself is an odd integer; the corresponding codeword
must have Euclidean weight 4 modulo 8, implying that C is Type I. Conversely, suppose C
is Type I. By (iii), 4 (C) = 4 (C ⊥ ) = 4 (C)∗ . Furthermore, 4 (C) is integral by (iv). Thus
by definition 4 (C) is unimodular. In addition, a codeword in C with Euclidean weight 4
modulo 8 corresponds to a lattice point with odd integer norm. This verifies (v).
m
a
e
T
Exercise 746 Prove Theorem 12.5.12(i) and (vi).
Example 12.5.13 By Example 12.2.5, the octacode o8 is a Type II code with minimum
Euclidean weight 8. By Theorem 12.5.12, 4 (o8 ) is a Type II lattice in R8 with minimum
norm 2. This is the Gosset lattice E 8 introduced in Example 10.6.5, which we recall is the
unique Type II lattice in R8 . From that example we saw that E 8 has precisely 240 lattice
points of minimum norm 2. These can be found in 4 (o8 ) as follows. The symmetrized
weight enumerator of o8 is
sweo8 (a, b, c) = a 8 + 16b8 + c8 + 14a 4 c4 + 112a 3 b4 c + 112ab4 c3 .
The codewords of o8 that are of Euclidean weight 8 are the 16 codewords which have all
components ±1 and the 112 codewords with four components ±1 and one component equal
to 2. The former yield 16 lattice points of “shape” 1/2(±18 ), indicating a vector with eight
entries equal to ±1/2. The latter yield 112 lattice points of shape 1/2(03 , ±14 , 2); to each of
these 112 lattice points, four can be subtracted from the component that equaled 2, yielding
112 lattice points of shape 1/2(03 , ±14 , −2). (Note that this is not the same version of E 8
described in Example 10.6.5. However, the two forms are equivalent.)
Exercise 747 Let C be one of the extended quadratic residue codes of length 24 presented in
Example 12.4.8. It is a Type II code of minimum Euclidean weight 16. By Theorem 12.5.12,
4 (C) is a Type II lattice in R24 with minimum norm 4. This lattice is equivalent to the
Leech lattice 24 discussed in Section 10.6. Recall that 24 has 196 560 lattice points of
minimum norm 4. Using sweC (a, b, c), describe how these minimum norm lattice points
arise from the codewords in C.
12.6
Galois rings
When studying cyclic codes over Fq , we often examine extension fields of Fq for a variety of
reasons. For example the roots of x n − 1, and hence the roots of the generator polynomial of
a cyclic code, are in some extension field of Fq . The extension field is often used to present
a form of the parity check matrix; see Theorem 4.4.3. The analogous role for codes over Z4
is played by Galois rings. We will describe properties of these rings, mostly without proof;
see [116, 230].
Let f (x) be a monic basic irreducible polynomial of degree r . By Lemmas 12.3.9 and
12.3.10, Z4 [x]/( f (x)) is a ring with 4r elements and only one nontrivial ideal. (Recall that
a finite field has no nontrivial ideals.) Such a ring is called a Galois ring. It turns out that
all Galois rings of the same order are isomorphic (just as all finite fields of the same order
are isomorphic), and we denote a Galois ring of order 4r by GR(4r ). Just as the field Fq
506
Codes over Z4
contains the subfield F p , where p is the characteristic of Fq , GR(4r ) is a ring of characteristic
4 containing the subring Z4 .
The nonzero elements of a finite field form a cyclic group; a generator of this group is
called a primitive element. The multiplicative structure of the field is most easily expressed
using powers of a fixed primitive element. This is not the case with Galois rings, but an
equally useful structure is present. We will use the following theorem, which we do not
prove, along with its associated notation, throughout the remainder of this chapter without
reference.
Theorem 12.6.1 The Galois ring R = GR(4r ) contains an element ξ of order 2r − 1.
Every element c ∈ R can be uniquely expressed in the form c = a + 2b, where a and b are
r
elements of T (R) = {0, 1, ξ, ξ 2 , . . . , ξ 2 −2 }.
The element ξ is called a primitive element; the expression c = a + 2b with a and b in
T (R) is called the 2-adic representation of c. The lone nontrivial ideal (2) is 2R = {2t |
t ∈ T (R)}; the elements of 2R consist of 0 together with all the divisors of zero in R. The
elements of R \ 2R are precisely the invertible elements of R.
Exercise 748 Let R = GR(4r ) have primitive element ξ . Using the fact that every element
c ∈ R is uniquely expressible in the form c = a + 2b, where a and b are in T (R) =
r
{0, 1, ξ, ξ 2 , . . . , ξ 2 −2 }, prove the following:
(a) The nontrivial ideal 2R = {2t | t ∈ T (R)}.
(b) The elements in 2R are precisely 0 and the divisors of zero in R.
(c) The invertible elements in R are exactly the elements in R \ 2R.
Since f (x) is a monic basic irreducible polynomial of degree r in Z4 [x], µ( f (x)) is an irreducible polynomial of degree r in F2 [x]. The reduction homomorphism µ : Z4 [x] →
F2 [x] induces a homomorphism µ from Z4 [x]/( f (x)) onto F2 [x]/(µ( f (x))) given by
µ(a(x) + ( f (x))) = µ(a(x)) + (µ( f (x))) with kernel (2). Thus if R is the Galois ring
Z4 [x]/( f (x)), the quotient ring R/2R is isomorphic to the field F2r with the primitive element ξ ∈ R mapped to a primitive element of F2r . The map µ is examined in Exercise 749.
Exercise 749 Prove that if f (x) is a monic basic irreducible polynomial in Z4 [x], then
µ : Z4 [x]/( f (x)) → F2 [x]/(µ( f (x))) given by µ(a(x) + ( f (x))) = µ(a(x)) + (µ( f (x)))
is a homomorphism onto F2 [x]/(µ( f (x))) with kernel (2); be sure that you verify that µ is
well-defined.
Let p be a prime and q = pr . Recall from Theorem 3.6.1 that the automorphism group
of the finite field Fq , called the Galois group Gal(Fq ) of Fq , is a cyclic group of order r
generated by the Frobenius automorphism σ p : Fq → Fq , where σ p (α) = α p . There is a
similar structure for the automorphism group of GR(4r ), also denoted Gal(GR(4r )), and
called the Galois group of GR(4r ). This group is also cyclic of order r generated by the
Frobenius automorphism ν2 : GR(4r ) → GR(4r ), with, ν2 defined by
ν2 (c) = a 2 + 2b2 ,
where a + 2b is the 2-adic representation of c. The elements of Fq fixed by σ p are precisely
507
12.6 Galois rings
the elements of the prime subfield F p . Similarly, the elements of GR(4r ) fixed by ν2 are
precisely the elements of the subring Z4 .
Exercise 750 Let R = GR(4r ) have primitive element ξ . The goal of this exercise is to
show that ν2 is indeed an automorphism of R and that the only elements of R fixed by
ν2 are the elements of Z4 . You may use the fact that every element c ∈ R is uniquely
r
expressible in the form c = a + 2b, where a and b are in T (R) = {0, 1, ξ, ξ 2 , . . . , ξ 2 −2 }.
r
Define τ (c) = c2 for all c ∈ R.
(a) Show that if x and y are elements of R, then (x + 2y)2 = x 2 .
(b) Show that if a and b are in T (R), then τ (a + 2b) = a. Note: Given c ∈ R, this shows
that τ (c) ∈ T (R) and also shows how to find a in order to write c = a + 2b with a and
b in T (R).
r −1
(c) Show that if x and y are elements of R, then τ (x + y) = τ (x) + τ (y) + 2(x y)2 .
(d) Let a1 , a2 , b1 , and b2 be in T (R). Show that (a1 + 2b1 ) + (a2 + 2b2 ) = (a1 + a2 +
r −1
r −1
r −1
2(a1 a2 )2 ) + 2((a1 a2 )2 + b1 + b2 ) and that a1 + a2 + 2(a1 a2 )2 ∈ T (R). (Note
r −1
r −1
that (a1 a2 )2 + b1 + b2 may not be in T (R) but 2((a1 a2 )2 + b1 + b2 ) is in
2T (R) = 2R.)
(e) Show that ν2 (x + y) = ν2 (x) + ν2 (y) and ν2 (x y) = ν2 (x)ν2 (y) for all x and y in R.
(f ) Show that if a1 and a2 are elements of T (R), then a12 = a22 implies that a1 = a2 . Hint:
Note that ξ has odd multiplicative order 2r − 1.
(g) Prove that ν2 is an automorphism of GR(4r ).
(h) Prove that the only elements a + 2b ∈ GR(4r ) with a and b in T (R) such that ν2 (a +
2b) = a + 2b are those elements with a ∈ {0, 1} and b ∈ {0, 1}. (These elements are
the elements of Z4 .)
By writing elements of GR(4r ) in the form a + 2b where a and b are in T (R), we can
easily work with the multiplicative structure in GR(4r ). We need to describe the additive
structure in GR(4r ). To make this structure easier to understand, we restrict our choice
of the basic irreducible polynomial f (x) used to define the Galois ring. Let n = 2r − 1.
There exists an irreducible polynomial f 2 (x) ∈ F2 [x] of degree r that has a root in
F2r of order n; recall that such a polynomial is called a primitive polynomial of F2 [x].
Using Graeffe’s method we can find a monic irreducible polynomial f (x) ∈ Z4 [x] such
that µ( f (x)) = f 2 (x); f (x) is called a primitive polynomial of Z4 [x]. In the Galois ring
GR(4r ) = Z4 [x]/( f (x)), let ξ = x + ( f (x)). Then f (ξ ) = 0 and ξ is in fact a primitive
element of GR(4r ). Every element of c ∈ GR(4r ) can be expressed in its “multiplicative”
form c = a + 2b with a and b in T (R) and in its additive form:
c=
r −1
i=0
ci ξ i , where ci ∈ Z4 .
(12.21)
Exercise 751 Recall from Example 12.3.8 that f (x) = x 3 + 2x 2 + x − 1 is a basic irreducible polynomial in Z4 [x] and an irreducible factor of x 7 − 1. Also, µ( f (x)) = x 3 + x +
1 is a primitive polynomial in F2 [x]. In the Galois ring R = GR(43 ) = Z4 [x]/( f (x)), let
ξ = x + ( f (x)). So ξ 3 + 2ξ 2 + ξ − 1 = 0 in GR(43 ). Do the following:
508
Codes over Z4
(a) Verify that the elements c ∈ T (R) and c ∈ 2T (R) when written in the additive form
c = b0 + b1 ξ + b2 ξ 2 are represented as follows:
Element b0
0
1
ξ
ξ2
ξ3
ξ4
ξ5
ξ6
b1
b2
0
0
1
0
3
3
3
2
0
0
0
1
2
3
1
1
0
1
0
0
1
2
3
1
Element b0
0
2
2ξ
2ξ 2
2ξ 3
2ξ 4
2ξ 5
2ξ 6
0
2
0
0
2
0
2
2
b1
b2
0
0
2
0
2
2
2
0
0
0
0
2
0
2
2
2
Compute ξ 7 and verify that it indeed equals 1.
What is the additive form of ξ 2 + 2ξ 6 ?
What is the multiplicative form of 3 + 2ξ + ξ 2 ?
Multiply (ξ 3 + 2ξ 4 )(ξ 2 + 2ξ 5 ) and write the answer in both multiplicative and additive
forms.
(f ) Add (ξ 2 + 2ξ 4 ) + (ξ 5 + 2ξ 6 ) and write the answer in both additive and multiplicative
forms.
(b)
(c)
(d)
(e)
In [27], the Z4 -quadratic residue codes of length n were developed using roots of x n − 1
in a Galois ring. However, our approach in Section 12.4 was to avoid the use of these rings,
and hence the roots, and instead focus on the idempotent generators.
Exercise 752 In Example 12.3.8 we factored x 7 − 1 over Z4 .
(a) Using the table given in Exercise 751 find the roots of g2 (x) = x 3 + 2x 2 + x − 1 and
g3 (x) = x 3 − x 2 + 2x − 1. Hint: The roots are also roots of x 7 − 1 and hence are
seventh roots of unity implying that they are powers of ξ .
(b) What do you notice about the exponents of the roots of g2 (x) and g3 (x)?
In constructing the Kerdock codes, we will use the relative trace function from GR(4r )
into Z4 , which is analogous to the trace function from F2r into F2 . Recall that the trace
function Trr : F2r → F2 is given by
Trr (α) =
r −1
i=0
i
α2 =
r −1
i=0
σ2i (α)
for α ∈ F2r ,
where σ2 is the Frobenius automorphism of F2r . The relative trace function TRr : GR(4r ) →
Z4 is given by
TRr (α) =
r −1
i=0
ν2i (α)
for α ∈ GR(4r ).
By Exercise 753, TRr (α) is indeed an element of Z4 . By Lemma 3.8.5, Trr is a surjective
map; analogously, TRr is a surjective map.
509
12.7 Kerdock codes
Exercise 753 Let ν2 be the Frobenius automorphism of GR(4r ) and TRr the relative trace
map. Do the following:
(a) Show that ν2r is the identity automorphism.
(b) Show that ν2 (TRr (α)) = TRr (α). Why does this show that TRr (α) is an element of Z4 ?
(c) Show that TRr (α + β) = TRr (α) + TRr (β) for all α and β in GR(4r ).
(d) Show that TRr (aα) = aTRr (α) for all a ∈ Z4 and α ∈ GR(4r ).
12.7
Kerdock codes
We now define the binary Kerdock code of length 2r +1 as the Gray image of the extended
code of a certain cyclic Z4 -linear code of length n = 2r − 1. Let H (x) be a primitive
basic irreducible polynomial of degree r . Let f (x) be the reciprocal polynomial of
(x n − 1)/((x − 1)H (x)). As in Section 12.3.3, f (x) is a factor of x n − 1. Define K (r + 1)
to be the cyclic code of length 2r − 1 over Z4 generated by f (x). By Corollary 12.3.14,
(r + 1) be the
K (r + 1) is a code of length n = 2r − 1 and type 4n−deg f = 4r +1 . Let K
extended cyclic code obtained by adding an overall parity check to K (r + 1). This is a code
of length 2r and type 4r +1 . The Kerdock code K(r + 1) is defined to be the Gray image
(r + 1)) of K
(r + 1). So K(r + 1) is a code of length 2r +1 with 4r +1 codewords. In
G( K
[116] it is shown that a simple rearrangement of the coordinates leads directly to the original
definition given by Kerdock in [166]. The results of this section are taken primarily from
[116]. Earlier Nechaev [243] discovered a related connection between Z4 -sequences and
Kerdock codes punctured on two coordinates.
Example 12.7.1 If n = 23 − 1, we can let H (x) = x 3 + 2x 2 + x − 1. In that case, f ∗ (x) =
(x 7 − 1)/((x − 1)H (x)) = x 3 − x 2 + 2x − 1 and f (x) = x 3 + 2x 2 + x − 1.
In this section GR(4r ) will denote the specific Galois ring Z4 [x]/(H (x)). Let ξ be a
primitive root of H (x) in GR(4r ). We will add the parity check position on the left of
(r + 1) and label the check position ∞. Hence if c∞ c0 c1 · · · cn−1 is a
a codeword of K
codeword, the corresponding “extended” polynomial form would be (c∞ , c(x)), where
c(x) = c0 + c1 x + · · · + cn−1 x n−1 .
Lemma 12.7.2 Let r ≥ 2. The following hold:
(i) A polynomial c(x) ∈ Rn is in K (r + 1) if and only if (x − 1)H (x)c∗ (x) = 0 in Rn .
r +1 for K (r + 1) and K
(r + 1), respectively, are
(ii) Generator matrices G r +1 and G
G r +1 =
1 1
1 ξ
1
ξ2
···
1
r +1 = 1 1 1
, G
· · · ξ n−1
0 1 ξ
1
ξ2
···
1
.
· · · ξ n−1
We can replace each ξ i by the column vector [bi,0 bi,1 · · · bi,r −1 ]T obtained from
−1
bi, j ξ j .
(12.21), where ξ i = rj=0
Proof: Since Rn = (x − 1)H (x) ⊕ f ∗ (x) by Theorem 12.3.12, (x − 1)H (x)c∗ (x) = 0
in Rn if and only if c∗ (x) ∈ f ∗ (x) if and only if c(x) = f (x)s(x) for some s(x) ∈ Rn ,
proving (i).
510
Codes over Z4
For (ii), we note that the matrix obtained from [1 ξ ξ 2 · · · ξ r −1 ] by replacing each
ξ with the coefficients from (12.21) is the r × r identity matrix. Hence the r + 1 rows
r +1 generate a code of type 4r +1 ; the same will be true of the r + 1 rows of G r +1
of G
once we show that the parity check of r1 = [1 1 1 · · · 1] is 1 and the parity check of
r2 = [1 ξ ξ 2 · · · ξ n−1 ] is 0. Since r1 has 2r − 1 1s and 2r − 1 ≡ 3 (mod 4) as r ≥ 2, the
n−1 i
ξ = −(ξ n − 1)/(ξ − 1) = 0 as
parity check for r1 is 1. The parity check for r2 is − i=0
n
ξ = 1.
To complete the proof of (ii), it suffices to show that c(x) ∈ K (r + 1) where c(x) is
n−1 i
x . As (x − 1)c∗ (x) =
the polynomial associated to r1 and r2 . For r1 , let c(x) = i=0
n−1 i
(x − 1) i=0 x = x n − 1 in Z4 [x], (x − 1)H (x)c∗ (x) = 0 in Rn showing c(x) ∈ K (r + 1)
by (i). For r2 , we are actually working with r possible codewords. We do not wish to deal
n−1 i i
ξ x , thus allowing the coefficients of
with them individually and so we let c(x) = i=0
c(x) to lie in the ring GR(4r ) rather than Z4 . Clearly, by (i), we only need to show that
H (x)c∗ (x) = 0 even when we allow the coefficients of c(x) to be elements of GR(4r ).
n−1 n−1−i i
ξ
x (where ξ in front of the summation makes c∗ (x) monic). Let
So c∗ (x) = ξ i=0
n−1
i
H (x) = i=0 Hi x . By Exercise 754, the coefficient of x k for 0 ≤ k ≤ n − 1 in the product
H (x)c∗ (x) is
i
n−1
ξ
i+ j≡k (mod n)
Hi ξ n−1− j = ξ
i=0
n−1
Hi ξ n−1−k+i = ξ n−k
Hi ξ i ,
(12.22)
i=0
since ξ n = 1. The right-hand side of (12.22) is ξ n−k H (ξ ) = 0 as ξ is a root of H (x).
Therefore (x − 1)H (x)c∗ (x) = 0.
We remark that (x − 1)H (x) is called the check polynomial of K (r + 1), analogous to
the terminology used in Section 4.4 for cyclic codes over Fq . Notice that the matrix G r +1
of Lemma 12.7.2 is reminiscent of the parity check matrix for the subcode of even weight
vectors in the binary Hamming code, where we write this parity check matrix over an
extension field of F2 , rather than over F2 .
n−1
n−1
ai x i and b(x) = i=0
bi x i
Exercise 754 Let R be a commutative ring with a(x) = i=0
n−1
in R[x]/(x n − 1). Prove that a(x)b(x) = p(x) = k=0 pk x k in R[x]/(x n − 1), where
pk =
with 0 ≤ i ≤ n − 1
ai b j
i+ j≡k (mod n)
and
0 ≤ j ≤ n − 1.
Example 12.7.3 Using the basic irreducible polynomial from Exercise 751, a generator
(4) from Theorem 12.7.2 is
matrix for K
1
0
4 =
G
0
0
1
1
0
0
1
0
1
0
1
0
0
1
1
1
3
2
1
2
3
3
1
3
3
1
1
1
.
2
1
511
12.7 Kerdock codes
Exercise 755 Show that the generator matrices in Examples 12.2.5 and 12.7.3 generate
the same Z4 -linear code.
(r + 1). Let 1n
We now use the relative trace to list all the codewords in K (r + 1) and K
denote the all-one vector of length n.
Lemma 12.7.4 Let r ≥ 2 and n = 2r − 1. Then c ∈ K (r + 1) if and only if there exists
λ ∈ R = GR(4r ) and ǫ ∈ Z4 such that
c = (TRr (λ), TRr (λξ ), TRr (λξ 2 ), . . . , TRr (λξ n−1 )) + ǫ1n .
(12.23)
The parity check for c is ǫ.
n−1 i
n−1
r
i
Proof: Using Exercise 753,
i=0 (TRr (λξ ) + ǫ) = TRr (λ
i=0 ξ ) + (2 − 1)ǫ =
n
r
r
r
TRr (λ(ξ − 1)/(ξ − 1)) + (2 − 1)ǫ = TRr (0) + (2 − 1)ǫ = (2 − 1)ǫ ≡ 3ǫ (mod 4). So
the parity check for c is ǫ.
Let C = {(TRr (λ), TRr (λξ ), TRr (λξ 2 ), . . . , TRr (λξ n−1 )) + ǫ1n | λ ∈ GR(4r ), ǫ ∈ Z4 }.
We first show that C has 4r +1 codewords; this is clear if we show that TRr (λξ i ) + ǫ =
TRr (λ1 ξ i ) + ǫ1 for 0 ≤ i ≤ n − 1 implies λ = λ1 and ǫ = ǫ1 . But ǫ = ǫ1 since the parity
check of c is ǫ. Therefore by Exercise 753, TRr (ζ ξ i ) = 0 for 0 ≤ i ≤ n − 1 where ζ =
λ − λ1 . We only need to show that ζ = 0. Since TRr (ζ ξ i ) = 0, TRr (ζ (2ξ i )) = 0 and hence
TRr (ζ s) = 0 for all s ∈ R again by Exercise 753. Thus TRr is 0 on the ideal (ζ ). Since TRr
is surjective, there is an α ∈ R such that TRr (α) = 1 and hence TRr (2α) = 2. Therefore
by Lemma 12.3.9, (ζ ) can only be the zero ideal and hence ζ = 0.
We now show that if c has the form in (12.23), then c ∈ K (r + 1). Since 1n ∈ K (r +
1), we only need to show that c ∈ K (r + 1) when ǫ = 0. If λ = 0, the result is clear. If
λ = 0, then c(x) = 0 by the above argument. Now c∗ (x) is some cyclic shift of p(x) =
n−1
TRr (λξ n−1−i )x i for some α ∈ Z4 chosen so that the leading coefficient is 1 or 2.
α i=0
∗
(c (x) may not equal p(x) without the shift as we do not know the degree of c(x).) By
Lemma 12.7.2, c ∈ K (r + 1) if H (x) p(x) = 0 in Rn . By the same argument as in the proof
of Lemma 12.7.2, the coefficient of x k for 0 ≤ k ≤ n − 1 in the product H (x) p(x) is
n−1
α
i+ j≡k (mod n)
Hi TRr (λξ n−1− j ) = αTRr λ
Hi ξ n−1−k+i
i=0
= αTRr λξ
n−1
n−1−k
Hi ξ
i=0
i
,
using Exercise 753, as Hi ∈ Z4 . The latter trace is TRr (λξ n−1−k H (ξ )) = TRr (0) = 0 and
hence H (x) p(x) = 0 in Rn . So c ∈ K (r + 1). Since K (r + 1) and C each have 4r +1 code
words and C ⊆ K (r + 1), C = K (r + 1).
Exercise 756 Let R = GR(4r ) with primitive element ξ ; let n = 2r − 1. Using the fact
that TRr : R → Z4 is a surjective group homomorphism under addition (Exercise 753),
show the following:
512
Codes over Z4
(a) For each a ∈ Z4 , there are exactly 4r −1 elements of R with relative trace equal to a.
(b) For each s ∈ {0, 2, 2ξ, 2ξ 2 , . . . , 2ξ n−1 } = 2R, TRr (s) ∈ {0, 2} with at least one value
equaling 2.
(c) For a = 0 or 2, there are exactly 2r −1 elements of 2R with relative trace equal to a.
(r + 1) we need one more computational
Before we give the Lee weight distribution of K
result.
Lemma 12.7.5 Let R = GR(4r ) have primitive element ξ . Let n = 2r − 1. Suppose that
λ ∈ R but λ ∈ 2R. Then all the elements of S = S 1 ∪ S 2 , where S 1 = {λ(ξ j − ξ k ) | 0 ≤
j ≤ n − 1, 0 ≤ k ≤ n − 1, j = k} and S 2 = {±λξ j | 0 ≤ j ≤ n − 1} are distinct, and
they are precisely the elements of R \ 2R.
Proof: By Exercise 748 the set of invertible elements in R is R \ 2R. We only need to show
that the elements of S are invertible and distinct because R \ 2R has 4r − 2r elements and,
if the elements of S are distinct, S has (2r − 1)(2r − 2) + 2(2r − 1) = 4r − 2r elements.
Since λ is invertible, the elements of S 2 are invertible. Clearly, the elements of S 2 are
distinct as −ξ j = ξ j + 2ξ j cannot equal ξ k for any j and k.
To show the elements of S 1 are invertible it suffices to show that ξ j − ξ k is invertible for
j = k. If this element is not invertible, ξ j − ξ k ∈ 2R. Recall that the reduction modulo 2
homomorphism µ : R → F2r = R/2R, discussed before Exercise 749, maps the primitive
element ξ of R to the primitive element µ(ξ ) = θ of F2r . Thus µ(ξ ) j + µ(ξ )k = 0, implying
that θ j = θ k . But θ j = θ k for j = k with j and k between 0 and n − 1. Thus the elements
of S 1 are invertible. We now show that the elements of S 1 are distinct. Suppose that
ξ j − ξ k = ξ ℓ − ξ m with j = k and ℓ = m; then
1 + ξ a = ξ b + ξ c,
(12.24)
where m − j ≡ a (mod n), ℓ − j ≡ b (mod n), and k − j ≡ c (mod n). So (1 + ξ a )2 −
ν2 (1 + ξ a ) = (ξ b + ξ c )2 − ν2 (ξ b + ξ c ), implying that 1 + ξ 2a + 2ξ a − ν2 (1) − ν2 (ξ a ) =
ξ 2b + ξ 2c + 2ξ b+c − ν2 (ξ b ) − ν2 (ξ c ), where ν2 is the Frobenius automorphism. Thus
2ξ a = 2ξ b+c , and hence a ≡ b + c (mod n). Therefore if x = θ a , y = θ b , and z = θ c , then
x = yz. Applying µ to (12.24), we also have 1 + x = y + z. Hence 0 = 1 + yz + y + z =
(1 + y)(1 + z) holds in F2r , implying y = 1 or z = 1. So b ≡ 0 (mod n) or c ≡ 0 (mod n).
Hence ℓ = j or k = j; as the latter is impossible, we have ℓ = j, which yields k = m as
well. So the elements of S 1 are distinct.
Finally, we show that the elements of S 1 ∪ S 2 are distinct. Assume ξ j − ξ k = ±ξ m .
By rearranging and factoring out a power of ξ , 1 + ξ a = ξ b . So (1 + ξ a )2 − ν2 (1 + ξ a ) =
ξ 2b − ν2 (ξ b ), giving 1 + 2ξ a + ξ 2a − (1 + ξ 2a ) = ξ 2b − ξ 2b . Therefore 2ξ a = 0, which is
a contradiction.
(r + 1), which is also
We now have the tools to find the Lee weight distribution of K
(r + 1)) by
the Hamming weight distribution of the binary Kerdock code K(r + 1) = G( K
Theorem 12.2.2.
513
12.7 Kerdock codes
(r + 1) of Lee
Theorem 12.7.6 Let r ≥ 3 with r odd. Let Ai be the number of vectors in K
weight i. Then
1
r +1 r
2
(2
−
1)
2r +2 − 2
Ai =
2r +1 (2r − 1)
1
0
if i = 0,
if i = 2r − 2(r −1)/2 ,
if i = 2r ,
if i = 2r + 2(r −1)/2 ,
if i = 2r +1 ,
otherwise.
This is also the Hamming weight distribution of the Kerdock code K(r + 1).
Proof: Let R = GR(4r ) and
vλ = (TRr (λ), TRr (λξ ), TRr (λξ 2 ), . . . , TRr (λξ n−1 )),
(r + 1) have the form (0, vλ ) + ǫ1n+1
where n = 2r − 1. By Lemma 12.7.4, the vectors in K
for λ ∈ R and ǫ ∈ Z4 . We compute the Lee weight of each of these vectors by examining
three cases. The first two cases give Ai for i = 0, 2r , and 2r +1 . The last case gives the
remaining Ai .
Case I: λ = 0
In this case vλ is the zero vector, and (0, vλ ) + ǫ1n+1 = ǫ1n+1 has Lee weight 0 if ǫ = 0,
n + 1 = 2r if ǫ = 1 or 3, and 2(n + 1) = 2r +1 if ǫ = 2. This case contributes one vector of
Lee weight 0, two vectors of Lee weight 2r , and one vector of Lee weight 2r +1 .
Case II: λ ∈ 2R with λ = 0
Thus λ = 2ξ i for some 0 ≤ i ≤ n − 1. Hence the set of components of vλ is always
{TRr (2ξ j ) | 0 ≤ j ≤ n − 1} (in some order) independent of i. But by Exercise 756(c),
exactly 2r −1 of these values are 2 and the remaining 2r −1 − 1 values are 0. Thus every
vector (0, vλ ) + ǫ1n+1 has either 2r −1 2s and 2r −1 0s (when ǫ = 0 or 2) or 2r −1 1s and 2r −1
3s (when ǫ = 1 or 3). In all cases these vectors have Lee weight 2r . This case contributes
4n = 2r +2 − 4 vectors of Lee weight 2r .
Case III: λ ∈ R with λ ∈ 2R
Let R# = R \ 2R. For j ∈ Z4 , let n j = n j (vλ ) be the number of components of vλ equal
to j. We√set up a sequence of equations involving the n j and then solve these equations.
Let i = −1. Define
n−1
S=
j
j=0
i TRr (λξ ) = n 0 − n 2 + i(n 1 − n 3 ).
(12.25)
514
Codes over Z4
If S is the complex conjugate of S, then
n−1
SS =
i TRr (λ(ξ
j
−ξ j ))
j=0
+
i TRr (λ(ξ
j
−ξ k ))
j=k
n−1
= 2r − 1 +
a∈R#
r
= 2 −1+
i TRr (a) −
i
TRr (a)
a∈R#
n−1
j
j=0
i TRr (λξ ) −
i TRr (−λξ
j
)
j=0
−S−S
TRr (a)
by Lemma 12.7.5. But
= a∈R i TRr (a) − a∈2R i TRr (a) = 0 − 0 = 0 by
a∈R# i
Exercise 756. Rearranging gives (S + 1)(S + 1) = 2r and therefore
(n 0 − n 2 + 1)2 + (n 1 − n 3 )2 = 2r
(12.26)
by (12.25). Thus we need integer solutions to the equation x 2 + y 2 = 2r . By Exercise 757, the solutions are x = δ1 2(r −1)/2 , and y = δ2 2(r −1)/2 , where δ1 and δ2 are ±1.4
By (12.26),
n 0 − n 2 = −1 + δ1 2(r −1)/2
n 1 − n 3 = δ2 2
(r −1)/2
and
(12.27)
(12.28)
.
However, 2vλ = v2λ is a vector from Case II, which means that
n 0 + n 2 = 2r −1 − 1
r −1
n1 + n3 = 2
and
(12.29)
(12.30)
.
Solving (12.27), (12.28), (12.29), and (12.30) produces
n 0 = 2r −2 + δ1 2(r −3)/2 − 1, n 1 = 2r −2 + δ2 2(r −3)/2 ,
r −2
n2 = 2
− δ1 2
(r −3)/2
r −2
, and n 3 = 2
(r −3)/2
− δ2 2
.
(12.31)
(12.32)
We can easily obtain n j ((0, vλ ) + ǫ1n+1 ) in terms of n j = n j (vλ ) as the following table
shows:
ǫ\ j
0
1
2
3
0
1
2
n0 + 1
n1
n2
n3
n0 + 1
n1
n2
n3
n0 + 1
n1
n2
n3
3
Lee weight
n3
2r − δ1 2(r −1)/2
n2
2r + δ2 2(r −1)/2
n1
2r + δ1 2(r −1)/2
n 0 + 1 2r − δ2 2(r −1)/2
The column “Lee weight” is obtained from the equation wt L ((0, vλ ) + ǫ1n+1 ) =
n 1 ((0, vλ ) + ǫ1n+1 ) + 2n 2 ((0, vλ ) + ǫ1n+1 ) + n 3 ((0, vλ ) + ǫ1n+1 ) by using (12.31) and
(12.32). Thus two of the four codewords have Lee weight 2r − 2(r −1)/2 , while the other two
have Lee weight 2r + 2(r −1)/2 . Therefore Case III contributes 2(4r − 2r ) = 2r +1 (2r − 1)
4
This is the only place in this proof where we require r to be odd.
515
12.8 Preparata codes
codewords of Lee weight 2r − 2(r −1)/2 and 2(4r − 2r ) = 2r +1 (2r − 1) codewords of Lee
weight 2r + 2(r −1)/2 .
Exercise 757 Let r ≥ 3 be odd. Let x and y be integer solutions of x 2 + y 2 = 2r . Do the
following:
(a) Show that x and y are both even.
(b) Show that x and y are integer solutions of x 2 + y 2 = 2r if and only if x1 and x2 are
integer solutions of x12 + y12 = 2r −2 , where x1 = x/2 and y1 = y/2.
(c) Prove that the only solutions of x 2 + y 2 = 2r are x = ±2(r −1)/2 and y = ±2(r −1)/2 .
Exercise 758 Verify the entries in the table in Case III of the proof of Theorem 12.7.6.
(4), which is the octacode o8
Exercise 759 Find the Lee weight distribution of K
(4)).
of Example 12.2.5 by Exercise 755, and hence the weight distribution of K(4) = G( K
Show that this is the same as the weight distribution of the Nordstrom–Robinson code
described in Section 2.3.4.
12.8
Preparata codes
The binary Preparata codes were originally defined in [288]. They are nonlinear codes that
are distance invariant of length 2r +1 and minimum distance 6. Also they are subsets of
extended Hamming codes. The distance distribution of the Preparata code of length 2r +1 is
related to the distance distribution of the Kerdock code of length 2r +1 by the MacWilliams
(r + 1)⊥ ) has the same distance distriequations. In [116], the authors observed that G( K
r +1
bution as the original Preparata code of length 2 . However, this code is not a subset of
the extended Hamming code, as the original Preparata codes are, but rather of a nonlinear
code with the same weight distribution as the extended Hamming code. So these codes are
“Preparata-like.”
(r + 1)⊥ and P(r + 1) = G(P(r + 1)).
For r ≥ 3 and r odd, let P(r + 1) = K
Theorem 12.8.1 Let r ≥ 3 be odd. Then P(r + 1) and P(r + 1) are distance invariant
codes. The Lee weight enumerator of P(r + 1), which is also the Hamming weight enumerator of P(r + 1), is
Lee P(r +1) (x, y) =
1
4r +1
r +1
[(y − x)2
r +1
+ (y + x)2
r
(r −1)/2
r
(r −1)/2
+ 2r +1 (2r − 1)(y − x)2 −2
+ 2r +1 (2r − 1)(y − x)2 +2
r
r
(r −1)/2
r
(r −1)/2
(y + x)2 +2
(y + x)2 −2
r
+ (2r +2 − 2)(y − x)2 (y + x)2 ].
Alternately, if B j (r + 1) is the number of codewords in P(r + 1) of Lee weight j, which is
516
Codes over Z4
also the number of codewords in P(r + 1) of Hamming weight j, then
r +1
1 , r +1
B j (r + 1) = r +1 K 2j ,2 (0) + 2r +1 (2r − 1)K 2j ,2 (2r − 2(r −1)/2 )
4
r +1
r +1
+ (2r +2 − 2)K 2j ,2 (2r ) + 2r +1 (2r − 1)K 2j ,2 (2r + 2(r −1)/2 )
r +1
+ K 2j ,2 (2r +1 ) ,
r +1
where K 2j ,2 is the Krawtchouck polynomial defined in Chapter 2. Furthermore, the minimum distance of P(r + 1) is 6 and B j (r + 1) = 0 if j is odd.
Proof: The formula for Lee P(r +1) (x, y) follows from (12.6) and Theorem 12.7.6. The alternate form comes from (K ) of Section 7.2. When applying (K ) we must decide what values
n,q
to use for q and n in K j . We use q = 2 because the MacWilliams transform for the Lee
weight enumerator satisfies (12.6), which is the same form used for binary codes (see (M3 )
in Section 7.2). We use n = 2 · 2r = 2r +1 because of the presence of “2n” in the exponent
in (12.4). Another way to understand the choice of q and n is to realize that the Lee weight
distribution of P(r + 1) is the same as the Hamming weight distribution of P(r + 1), which
is a binary code of length 2r +1 . The last statement follows from Exercise 760.
Exercise 760 Do the following:
(a) If r ≥ 3 and r is odd, show that the Lee weight of any codeword in P(r + 1) is even.
Hint: See Exercise 690.
(b) Using the formula for B j (r + 1) in Theorem 12.8.1 (and a computer algebra package),
show that B2 (r + 1) = B4 (r + 1) = 0 and B6 (r + 1) = (−2r +1 + 7 · 22r − 7 · 23r +
24r +1 )/45.
13
Codes from algebraic geometry
Since the discovery of codes using algebraic geometry by V. D. Goppa in 1977 [105],
there has been a great deal of research on these codes. Their importance was realized when
in 1982 Tsfasman, Vlăduţ, and Zink [333] proved that certain algebraic geometry codes
exceeded the Asymptotic Gilbert–Varshamov Bound, a feat many coding theorists felt could
never be achieved. Algebraic geometry codes, now often called geometric Goppa codes,
were originally developed using many extensive and deep results from algebraic geometry.
These codes are defined using algebraic curves. They can also be defined using algebraic
function fields as there is a one-to-one correspondence between “nice” algebraic curves and
these function fields. The reader interested in the connection between these two theories can
consult [320, Appendix B]. Another approach appeared in the 1998 publication by Høholdt,
van Lint, and Pellikaan [135], where the theory of order and weight functions was used to
describe a certain class of geometric Goppa codes.
In this chapter we choose to introduce a small portion of the theory of algebraic curves,
enough to allow us to define algebraic geometry codes and present some simple examples.
We will follow a very readable treatment of the subject by J. L. Walker [343]. Her monograph
would make an excellent companion to this chapter. For those who want to learn more about
the codes and their decoding but have a limited understanding of algebraic geometry, the
Høholdt, van Lint, and Pellikaan chapter in the Handbook of Coding Theory [135] can
be examined. For those who have either a strong background in algebraic geometry or
a willingness to learn it, the texts by C. Moreno [240] or H. Stichtenoth [320] connect
algebraic geometry to coding.
13.1
Affine space, projective space, and homogenization
Algebraic geometry codes are defined with respect to curves in either affine or projective
space. In this section we introduce these concepts.
Let F be a field, possibly infinite. We define n-dimensional affine space over F, denoted An (F), to be the ordinary n-dimensional vector space Fn ; the points in An (F)
are (x1 , x2 , . . . , xn ) with xi ∈ F. Defining n-dimensional projective space is more complicated. First, let Vn be the nonzero vectors in Fn+1 . If x = (x1 , x2 , . . . , xn+1 ) and
′
) are in Vn , we say that x and x′ are equivalent, denoted x ∼ x′ , if
x′ = (x1′ , x2′ , . . . , xn+1
there exists a nonzero λ ∈ F such that xi′ = λxi for 1 ≤ i ≤ n + 1. The relation ∼ is indeed an equivalence relation as Exercise 761 shows. The equivalence classes are denoted
(x1 : x2 : · · · : xn+1 ) and consist of all nonzero scalar multiples of (x1 , x2 , . . . , xn+1 ). We
518
Codes from algebraic geometry
define n-dimensional projective space over F, denoted Pn (F), to be the set of all equivalence
classes (x1 : x2 : · · · : xn+1 ), called points, with xi ∈ F. Note that the zero vector is excluded
in projective space, and the points of Pn (F) can be identified with the 1-dimensional subspaces of An+1 (F). The coordinates x1 , x2 , . . . , xn+1 are called homogeneous coordinates
of Pn (F). If P = (x1 : x2 : · · · : xn+1 ) ∈ Pn (F) with xn+1 = 0, then P is called a point at
infinity. Points in Pn (F) not at infinity are called affine points. By Exercise 761, each affine
point in Pn (F) can be uniquely represented as (x1 : x2 : · · · : xn : 1), and each point at infinity has a zero in its right-most coordinate and can be represented uniquely with a one in
its right-most nonzero coordinate. P1 (F) is called the projective line over F, and P2 (F) is
called the projective plane over F.
Exercise 761 Do the following:
(a) Prove that ∼ is an equivalence relation on Vn .
(b) Show that the affine points of Pn (F) can be represented uniquely by (x1 : x2 : · · · : xn : 1)
and hence correspond in a natural one-to-one manner with the points of An (F).
(c) Show that the points at infinity have a zero in the right-most coordinate and can be
represented uniquely with a one in the right-most nonzero coordinate.
(d) Prove that if F = Fq , each equivalence class (point) (x1 : x2 : · · · : xn+1 ) contains q − 1
vectors from Vn .
(e) Prove that Pn (Fq ) contains q n + q n−1 + · · · + q + 1 points.
(f) Prove that Pn (Fq ) contains q n−1 + q n−2 + · · · + q + 1 points at infinity.
Example 13.1.1 The projective line P1 (Fq ) has one point at infinity (1 : 0) and q affine
points (x1 : 1) where x1 ∈ Fq . The projective plane P2 (Fq ) has one point at infinity of the
form (1 : 0 : 0), q points at infinity of the form (x1 : 1 : 0) with x1 ∈ Fq , and q 2 affine points
(x1 : x2 : 1) where x1 ,x2 ∈ Fq .
Exercise 762 In Section 6.5 we defined a projective plane to be a set of points and a
set of lines, consisting of points, such that any two distinct points determine a unique line
that passes though these two points and any two distinct lines have exactly one point in
common. In this exercise, you will show that P2 (F) is a projective plane in the sense of
Section 6.5. To do this you need to know what the lines in P2 (F) are. Let a, b, and c be in F,
with at least one being nonzero. The set of points whose homogeneous coordinates satisfy
ax1 + bx2 + cx3 = 0 form a line.
(a) Show that if ax1 + bx2 + cx3 = 0 and (x1 : x2 : x3 ) = (x1′ : x2′ : x3′ ), then ax1′ + bx2′ +
cx3′ = 0. (This shows that it does not matter which representation you use for a point
in order to decide if the point is on a given line or not.)
(b) Prove that two distinct points of P2 (F) determine a unique line that passes though these
two points.
(c) Prove that any two distinct lines of P2 (F) have exactly one point in common.
(d) Parts (b) and (c) show that P2 (F) is a projective plane in the sense of Section 6.5. If
F = Fq , what is the order of this plane?
Later we will deal with polynomials and rational functions (the ratios of polynomials).
It will be useful to restrict to homogeneous polynomials or the ratios of homogeneous
519
13.1 Affine space, projective space, and homogenization
polynomials. In particular, we will be interested in the zeros of polynomials and in the
zeros and poles of rational functions.
Let x1 , . . . , xn be n independent indeterminates and let F[x1 , . . . , xn ] denote the set
of all polynomials in these indeterminates with coefficients from F. A polynomial f in
F[x1 , . . . , xn ] is homogeneous of degree d if every term of f is of degree d. If a polynomial
f of degree d is not homogeneous, we can “make” it be homogeneous by adding one more
variable and appending powers of that variable to each term in f so that each modified term
has degree d. This process is called the homogenization of f and the resulting polynomial
is denoted f H . More formally,
x1
x2
xn
H
d
f (x1 , x2 , . . . , xn , xn+1 ) = xn+1 f
,
,...,
,
(13.1)
xn+1 xn+1
xn+1
where f has degree d.
Example 13.1.2 The polynomial f (x1 , x2 ) = x13 + 4x12 x2 − 7x1 x22 + 8x23 is homogeneous
of degree 3 over the real numbers. The polynomial g(x1 , x2 ) = 3x15 − 6x1 x22 + x24 of degree
5 is not homogeneous; its homogenization is g H (x1 , x2 , x3 ) = 3x15 − 6x1 x22 x32 + x24 x3 .
Exercise 763 Give the homogenizations of the following polynomials over the real
numbers:
(a) f (x1 ) = x14 + 3x12 − x1 + 9,
(b) g(x1 , x2 ) = 5x18 − x1 x23 + 4x12 x25 − 7x24 ,
(c) h(x1 , x2 , x3 ) = x12 x23 x35 − x1 x312 + x24 x3 .
Notice that f H (x1 , . . . , xn , 1) = f (x1 , . . . , xn ) and so we can easily recover the original
polynomial from its homogenization. Also, if we begin with a homogeneous polynomial
g(x1 , . . . , xn+1 ) of degree d, then f (x1 , . . . , xn ) = g(x1 , . . . , xn , 1) is a polynomial of ded−k H
f . In this manner we
gree d or less; furthermore, if f has degree k ≤ d, then g = xn+1
obtain a one-to-one correspondence between polynomials in n variables of degree d or less
and homogeneous polynomials of degree d in n + 1 variables. This proves the last statement
of the following theorem; the proof of the remainder is left as an exercise.
Theorem 13.1.3 Let g(x1 , . . . , xn+1 ) be a homogeneous polynomial of degree d over F and
f (x1 , . . . , xn ) any polynomial of degree d over F.
(i) If α ∈ F, then g(αx1 , . . . , αxn+1 ) = α d g(x1 , . . . , xn+1 ).
(ii) f (x1 , . . . , xn ) = 0 if and only if f H (x1 , . . . , xn , 1) = 0.
′
), then g(x1 , . . . , xn+1 ) = 0 if and
(iii) If (x1 : x2 : · · · : xn+1 ) = (x1′ : x2′ : · · · : xn+1
′
′
only if g(x1 , . . . , xn+1 ) = 0. In particular, f H (x1 , . . . , xn+1 ) = 0 if and only if
′
) = 0.
f H (x1′ , . . . , xn+1
(iv) There is a one-to-one correspondence between polynomials in n variables of degree d
or less and homogeneous polynomials of degree d in n + 1 variables.
Exercise 764 Prove the first three parts of Theorem 13.1.3.
The implications of Theorem 13.1.3 will prove quite important. Part (ii) implies that the
zeros of f in An (F) correspond precisely to the affine points in Pn (F) that are zeros of f H. f H
520
Codes from algebraic geometry
may have other zeros that are points at infinity in Pn (F). Furthermore, part (iii) indicates that
the concept of a point in Pn (F) being a zero of a homogeneous polynomial is well-defined.
Exercise 765 Give the one-to-one correspondence between the homogeneous polynomials
of degree 2 in two variables over F2 and the polynomials of degree 2 or less (the zero
polynomial is not included) in one variable over F2 as indicated in the proof of Theorem
13.1.3(iv).
13.2
Some classical codes
Before studying algebraic geometry codes it is helpful to examine some classical codes that
will motivate the definition of algebraic geometry codes.
13.2.1 Generalized Reed–Solomon codes revisited
In Section 5.2 we first defined Reed–Solomon codes as special cases of BCH codes. Theorem 5.2.3 gave an alternative formulation of narrow-sense RS codes, which we now recall.
For k ≥ 0, P k denotes the set of polynomials of degree less than k, including the zero
polynomial, in Fq [x]. If α is a primitive nth root of unity in Fq where n = q − 1, then the
code
C = {( f (1), f (α), f (α 2 ), . . . , f (α q−2 )) | f ∈ P k }
is the narrow-sense [n, k, n − k + 1] RS code over Fq . From Section 5.3, we saw that C
could be extended to an [n + 1, k, n − k + 2] code given by
C = {( f (1), f (α), f (α 2 ), . . . , f (α q−2 ), f (0)) | f ∈ P k }.
We now recall the definition of the generalized Reed–Solomon (GRS) codes. Let n be
any integer with 1 ≤ n ≤ q, γ = (γ0 , . . . , γn−1 ) an n-tuple of distinct elements of Fq , and
v = (v0 , . . . , vn−1 ) an n-tuple of nonzero elements of Fq . Let k be an integer with 1 ≤ k ≤ n.
Then the codes
GRSk (γ, v) = {(v0 f (γ0 ), v1 f (γ1 ), . . . , vn−1 f (γn−1 )) | f ∈ P k }
are the GRS codes. The narrow-sense RS code C is the GRS code with n = q − 1, γi = α i ,
C is the GRS code with n = q, γi = α i for 0 ≤ i ≤ q − 2,
and vi = 1; the extended code
γq−1 = 0, and vi = 1. By Theorem 13.1.3(iv), there is a one-to-one correspondence between
the homogeneous polynomials of degree k − 1 in two variables over Fq and the nonzero
polynomials of P k . We denote the homogeneous polynomials of degree k − 1 in two variables over Fq together with the zero polynomial by Lk−1 . If f (x1 ) ∈ P k with f nonzero of
degree d ≤ k − 1, then g(x1 , x2 ) = x2k−1−d f H ∈ Lk−1 and g(x1 , 1) = f H (x1 , 1) = f (x1 )
by (13.1). Therefore we can redefine the GRS codes as follows using homogeneous polynomials and projective points. Let Pi = (γi , 1), which is a representative of the projective
point (γi : 1) ∈ P1 (Fq ). Then
GRSk (γ, v) = {(v0 g(P0 ), v1 g(P1 ), . . . , vn−1 g(Pn−1 )) | g ∈ Lk−1 }.
(13.2)
521
13.2 Some classical codes
If we wish we can also include in our list a representative, say (1, 0), of the point at infinity.
Exercise 766 indicates that if the representatives of the projective points are changed, an
equivalent code is obtained.
Exercise 766 Let Pi′ = (βi γi , βi ) with βi = 0. Prove that the code
′
{(v0 g(P0′ ), v1 g(P1′ ), . . . , vn−1 g(Pn−1
)) | g ∈ Lk−1 }
is equivalent to the code in (13.2).
13.2.2 Classical Goppa codes
Classical Goppa codes were introduced by V. D. Goppa in 1970 [103, 104]. These codes are
generalizations of narrow-sense BCH codes and subfield subcodes of certain GRS codes.
We first give an alternate construction of narrow-sense BCH codes of length n over Fq
to motivate the definition of Goppa codes. Let t = ordq (n) and let β be a primitive nth root
of unity in Fq t . Choose δ > 1 and let C be the narrow-sense BCH code of length n and
designed distance δ. Then c(x) = c0 + c1 x + · · · + cn−1 x n−1 ∈ Fq [x]/(x n − 1) is in C if
and only if c(β j ) = 0 for 1 ≤ j ≤ δ − 1. We have
n−1
(x n − 1)
i=0
ci
=
x − β −i
n−1
n−1
n−1
ci
i=0
ℓ=0
x ℓ (β −i )n−1−ℓ =
n−1
ci (β ℓ+1 )i .
xℓ
ℓ=0
(13.3)
i=0
Because c(β ℓ+1 ) = 0 for 0 ≤ ℓ ≤ δ − 2, the right-hand side of (13.3) is a polynomial whose
lowest degree term has degree at least δ − 1. Hence the right-hand side of (13.3) can be
written as x δ−1 p(x), where p(x) is a polynomial in Fq t [x]. Thus we can say that c(x) ∈
Fq [x]/(x n − 1) is in C if and only if
n−1
i=0
ci
x δ−1 p(x)
=
x − β −i
xn − 1
or equivalently
n−1
i=0
ci
≡ 0 (mod x δ−1 ).
x − β −i
This equivalence modulo x δ−1 means that if the left-hand side is written as a rational function
a(x)/b(x), then the numerator a(x) will be a multiple of x δ−1 (noting that the denominator
b(x) = x n − 1).
This last equivalence is the basis of our definition of classical Goppa codes. To define a
Goppa code of length n over Fq , first fix an extension field Fq t of Fq ; we do not require that
t = ordq (n). Let L = {γ0 , γ1 , . . . , γn−1 } be n distinct elements of Fq t . Let G(x) ∈ Fq t [x]
with G(γi ) = 0 for 0 ≤ i ≤ n − 1. Then the Goppa code Ŵ(L , G) is the set of vectors
c0 c1 · · · cn−1 ∈ Fqn such that
n−1
i=0
ci
≡ 0 (mod G(x)).
x − γi
(13.4)
522
Codes from algebraic geometry
Again this means that when the left-hand side is written as a rational function, the numerator
is a multiple of G(x). (Working modulo G(x) is like working in the ring Fq t [x]/(G(x));
requiring that G(γi ) = 0 guarantees that x − γi is invertible in this ring.) G(x) is the Goppa
polynomial of Ŵ(L , G). The narrow-sense BCH code of length n and designed distance δ
is the Goppa code Ŵ(L , G) with L = {1, β −1 , β −2 , . . . , β 1−n } and G(x) = x δ−1 .
Exercise 767 Construct a binary Goppa code of length 4 using (13.4) as follows. Use
the field F8 given in Example 3.4.3 with primitive element α. Let L = {γ0 , γ1 , γ2 , γ3 } =
{1, α, α 2 , α 4 } and G(x) = x − α 3 .
(a) When writing the left-hand side of (13.4) as a rational function a(x)/b(x), the numerator
a(x) begins c0 (x − α)(x − α 2 )(x − α 4 ) + · · · . Complete the rest of the numerator (but
do not simplify).
(b) The numerator a(x) found in part (a) is a multiple of G(x) = x − α 3 , which, in this case,
is equivalent to saying a(α 3 ) = 0. Simplify a(α 3 ) to obtain an F8 -linear combination of
c0 , c1 , c2 , and c3 . Hint: First compute α 3 − 1, α 3 − α, α 3 − α 2 , and α 3 − α 4 .
(c) There are 16 choices for c0 c1 c2 c3 ∈ F42 . Test each of these using the results of part (b)
to find the codewords in Ŵ(L , G).
We now find a parity check matrix for Ŵ(L , G). Notice that
1 G(x) − G(γi )
1
≡−
(mod G(x))
x − γi
G(γi )
x − γi
since, by comparing numerators, 1 ≡ −G(γi )−1 (G(x) − G(γi )) (mod G(x)). So by (13.4)
c = c0 c1 · · · cn−1 ∈ Ŵ(L , G) if and only if
n−1
ci
i=0
G(x) − G(γi )
G(γi )−1 ≡ 0 (mod G(x)).
x − γi
Suppose G(x) =
w
j=0
(13.5)
g j x j with g j ∈ Fq t , where w = deg(G(x)). Then
G(x) − G(γi )
G(γi )−1 = G(γi )−1
x − γi
j−1
w
−1
x
k=0
j−1−k
k=0
j=1
w−1
= G(γi )
x k γi
gj
k
w
g j γi
j=k+1
j−1−k
.
Therefore, by (13.5), setting the coefficients of x k equal to 0, in the order k = w − 1,
w − 2, . . . , 0, we have that c ∈ Ŵ(L , G) if and only if H cT = 0, where
h 0 gw
h 0 (gw−1 + gw γ0 )
H =
j−1
h 0 wj=1 g j γ0
h 1 gw
···
h 1 (gw−1 + gw γ1 ) · · ·
..
.
j−1
···
h 1 wj=1 g j γ1
h n−1 gw
h n−1 (gw−1 + gw γn−1 )
w
j−1
h n−1 j=1 g j γn−1
(13.6)
523
13.2 Some classical codes
with h i = G(γi )−1 . By Exercise 768, H
G(γ0 )−1
G(γ1 )−1
G(γ0 )−1 γ0
G(γ1 )−1 γ1
H′ =
G(γ0 )−1 γ0w−1
G(γ1 )−1 γ1w−1
can be row reduced to the w × n matrix H ′ , where
···
G(γn−1 )−1
· · · G(γn−1 )−1 γn−1
(13.7)
.
..
.
···
w−1
G(γn−1 )−1 γn−1
Exercise 768 Show that the matrix H of (13.6) can be row reduced to H ′ of (13.7).
By (5.4) we see that H ′ is the generator matrix for GRSw (γ, v) where γ =
{γ0 , γ1 , . . . , γn−1 } and vi = G(γi )−1 . This shows that Ŵ(L , G) is the subfield subcode
GRSw (γ, v)⊥ |Fq of the dual of GRSw (γ, v); see Section 3.8. Since GRSw (γ, v)⊥ is also a
GRS code by Theorem 5.3.3, a Goppa code is a subfield subcode of a GRS code.
The entries of H ′ are in Fq t . By choosing a basis of Fq t over Fq , each element of Fq t
can be represented as a t × 1 column vector over Fq . Replacing each entry of H ′ by its
corresponding column vector, we obtain a tw × n matrix H ′′ over Fq which has the property
that c ∈ Fqn is in Ŵ(L , G) if and only if H ′′ cT = 0. Compare this to Theorem 4.4.3.
We have the following bounds on the dimension and minimum distance of a Goppa code.
Theorem 13.2.1 In the notation of this section, the Goppa code Ŵ(L , G) with deg(G(x)) =
w is an [n, k, d] code, where k ≥ n − wt and d ≥ w + 1.
Proof: The rows of H ′′ may be dependent and hence this matrix has rank at most wt. Hence
Ŵ(L , G) has dimension at least n − wt. If a nonzero codeword c ∈ Ŵ(L , G) has weight w
or less, then when the left-hand side of (13.4) is written as a rational function, the numerator
has degree w − 1 or less; but this numerator must be a multiple of G(x), which is impossible
as deg(G(x)) = w.
Exercise 769 This is a continuation of Exercise 767. Do the following:
(a) Find the parity check matrix H ′ from (13.7) for the Goppa code found in Exercise 767.
(b) Give the parity check equation from the matrix in (a). How does this equation compare
with the equation found in Exercise 767(b)?
(c) By writing each element of F8 as a binary 3-tuple from Example 3.4.3, construct the
parity check matrix H ′′ as described above.
(d) Use the parity check matrix from part (c) to find the vectors in Ŵ(L , G) directly.
Exercise 770 Construct a binary Goppa code of length 7 from a parity check matrix
as follows. Use the field F8 given in Example 3.4.3 with primitive element α. Let L =
{0, 1, α, α 2 , α 4 , α 5 , α 6 } and G(x) = x − α 3 .
(a) Find the parity check matrix H ′ from (13.7) for Ŵ(L , G).
(b) Find the binary parity check matrix H ′′ from H ′ as discussed above.
(c) What other code is Ŵ(L , G) equivalent to? What are the dimension and minimum weight
of Ŵ(L , G), and how do these compare to the bounds of Theorem 13.2.1?
(d) Based on what you have discovered in part (c), state and prove a theorem about a
binary Goppa code Ŵ(L , G) of length 2t − 1 with G(x) = x − αr for some r with 0 ≤
r ≤ 2t − 2 and L = {α i | 0 ≤ i ≤ 2t − 2, i = r } ∪ {0} where α is a primitive element
of F2t .
524
Codes from algebraic geometry
Another formulation for Goppa codes can be given as follows. Let R be the vector space
of all rational functions f (x) = a(x)/b(x) with coefficients in Fq t , where a(x) and b(x) are
relatively prime and satisfy two requirements. First, the zeros of a(x) include the zeros of
G(x) with at least the same multiplicity as in G(x); second, the only possible zeros of b(x),
that is the poles of f (x), are from γ0 , . . . , γn−1 each with at most multiplicity one. Any
rational function f (x) has a Laurent series expansion about γi . The rational functions f (x)
in R have Laurent series expansion
∞
j=−1
f j (x − γi ) j
about γi where f −1 = 0 if f (x) has a pole at γi and f −1 = 0 otherwise. The residue of f (x)
at γi , denoted Resγi f , is the coefficient f −1 . Let
C = {(Resγ0 f, Resγ1 f, . . . , Resγn−1 f ) | f (x) ∈ R}.
(13.8)
Exercise 771 shows that Ŵ(L , G) is the subfield subcode C|Fq .
Exercise 771 Prove that Ŵ(L , G) = C|Fq , where C is given in (13.8).
13.2.3 Generalized Reed–Muller codes
In Section 1.10 we introduced the binary Reed–Muller codes. We can construct similar
codes, called generalized Reed–Muller codes, over other fields using a construction parallel
to that of the GRS codes. Kasami, Lin, and Peterson [163] first introduced the primitive
generalized Reed–Muller codes which we study here; the nonprimitive codes, which we do
not investigate, were presented in [349]. Primitive and nonprimitive codes are also examined
extensively in [4, 5].
Let m be a positive integer and let P1 , P2 , . . . , Pn be the n = q m points in the affine
space Am (Fq ). For any integer r with 0 ≤ r ≤ m(q − 1), let Fq [x1 , x2 , . . . , xm ]r be the
polynomials in Fq [x1 , x2 , . . . , xm ] of degree r or less together with the zero polynomial.
Define the r th order generalized Reed–Muller, or GRM, code of length n = q m to be
Rq (r, m) = {( f (P1 ), f (P2 ), . . . , f (Pn )) | f ∈ Fq [x1 , x2 , . . . , xm ]r }.
If a term in f has a factor xie where e = q + d ≥ q, then the factor can be replaced by xi1+d
without changing the value of f at any point P j because β q+d = β q β d = ββ d as β q = β
for all β ∈ Fq . Thus we see that
Rq (r, m) = {( f (P1 ), f (P2 ), . . . , f (Pn )) | f ∈ Fq [x1 , x2 , . . . , xm ]r∗ },
where Fq [x1 , x2 , . . . , xm ]r∗ is all polynomials in Fq [x1 , x2 , . . . , xm ]r with no term having an
exponent q or higher on any variable. Clearly, this is a vector space over Fq with a basis
Bq (r, m) = x1e1 · · · xmem 0 ≤ ei < q, e1 + e2 + · · · + em ≤ r .
Obviously, {( f (P1 ), f (P2 ), . . . , f (Pn )) | f ∈ Bq (r, m)} spans Rq (r, m). These codewords
are in fact independent; this fact is shown in the binary case in Exercise 772.
525
13.2 Some classical codes
Example 13.2.2 We construct generator matrices for R2 (1, 2), R2 (2, 2), and R2 (2, 3); from
these matrices we conclude that these codes are the original RM codes R(1, 2), R(2, 2), and
R(2, 3). The basis B 2 (1, 2) of F2 [x1 , x2 ]∗1 is { f 1 , f 2 , f 3 }, where f 1 (x1 , x2 ) = 1, f 2 (x1 , x2 ) =
x1 , and f 3 (x1 , x2 ) = x2 . Let P1 = (0, 0), P2 = (1, 0), P3 = (0, 1), and P4 = (1, 1) be the
points in A2 (F2 ). By evaluating f 1 , f 2 , f 3 in that order at P1 , P2 , P3 , P4 , we obtain the
matrix
1 1 1 1
G ′ (1, 2) = 0 1 0 1 .
0 0 1 1
This matrix clearly has independent rows and generates the same code R(1, 2) as G(1, 2)
did in Section 1.10. The basis B 2 (2, 2) of F2 [x1 , x2 ]∗2 is { f 1 , f 2 , f 3 , f 4 }, where f 4 (x1 , x2 ) =
x1 x2 . Using the same points and by evaluating f 1 , f 2 , f 3 , f 4 in order, we obtain the matrix
1 1 1 1
0 1 0 1
G ′ (2, 2) =
0 0 1 1 .
0
0
0
1
This matrix again clearly has independent rows and generates the same code as I4 does,
which is R(2, 2). The basis B 2 (2, 3) of F2 [x1 , x2 , x3 ]∗2 is { f 1 , . . . , f 7 }, where f 5 (x1 , x2 , x3 ) =
x3 , f 6 (x1 , x2 , x3 ) = x1 x3 , and f 7 (x1 , x2 , x3 ) = x2 x3 . Let P1 = (0, 0, 0), P2 = (1, 0, 0), P3 =
(0, 1, 0), P4 = (1, 1, 0), P5 = (0, 0, 1), P6 = (1, 0, 1), P7 = (0, 1, 1), P8 = (1, 1, 1) be the
points in A3 (F2 ). By evaluating f 1 , . . . , f 7 in that order we obtain the matrix
1 1 1 1 1 1 1 1
0 1 0 1 0 1 0 1
0 0 1 1 0 0 1 1
G ′ (2, 3) = 0 0 0 1 0 0 0 1 .
0 0 0 0 1 1 1 1
0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1
Notice that
G ′ (2, 3) =
G ′ (2, 2)
O
G ′ (2, 2)
,
G ′ (1, 2)
which indicates that G ′ (2, 3) generates R(2, 3) by (1.5) and (1.7).
Exercise 772 By ordering the points P1 , P2 , . . . , Pn correctly with n = 2m and the basis
B 2 (r, m) of F2 [x1 , . . . , xm ]r∗ appropriately, a generator matrix for R2 (r, m) decomposes to
show that this code can be obtained from the (u|u + v) construction.
(a) Study the construction in Example 13.2.2 and show how to generalize this inductively
to obtain a generator matrix G ′ (r, m) of R2 (r, m) from generator matrices G ′ (r, m − 1)
and G ′ (r − 1, m − 1) for R2 (r, m − 1) and R2 (r − 1, m − 1) in the form
G ′ (r, m) =
G ′ (r, m − 1)
O
G ′ (r, m − 1)
.
G ′ (r − 1, m − 1)
526
Codes from algebraic geometry
Hint: In Example 13.2.2 observe that f 1 , . . . , f 4 do not involve x3 , but f 5 = f 1 x3 , f 6 =
f 2 x3 , f 7 = f 3 x3 do.
(b) Explain why part (a) shows that R2 (r, m) is the ordinary binary RM code R(r, m).
(c) Explain why part (a) shows that {( f (P1 ), f (P2 ), . . . , f (Pn )) | f ∈ B 2 (r, m)} is independent.
13.3
Algebraic curves
The Reed–Solomon and generalized Reed–Muller codes have similar definitions, namely
as the set of n-tuples ( f (P1 ), . . . , f (Pn )) where P1 , . . . , Pn are fixed points in affine space
and f runs through a specified set of functions. We can do the same for generalized Reed–
Solomon codes with a minor modification to (13.2) where the points are in projective space.
It is these constructions that were generalized by Goppa in [105, 106]. The codes that he
constructed are now termed algebraic geometry codes or geometric Goppa codes.
In order to describe Goppa’s construction, we need to study affine and projective curves
with an emphasis on the points on these curves in given fields. We will limit our discussion
to curves in the plane. These curves will be described in one of two ways. An affine plane
curve X is the set of affine points (x, y) ∈ A2 (F), denoted X f (F), where f (x, y) = 0 with
f ∈ F[x, y]. A projective plane curve X is the set of projective points (x : y : z) ∈ P2 (F),
also denoted X f (F), where f (x, y, z) = 0 with f a homogeneous polynomial in F[x, y, z].1
Suppose that f ∈ F[x, y]. If f H is the homogenization of f , then X f H (F) is called the
projective closure of X f (F). In a sense, the only difference between X f (F) and X f H (F)
is that points at infinity have been added to X f (F) to produce X f H (F); this follows from
Theorem 13.1.3(ii). In many situations the function defining a curve will be defined over a
field F but we will want the curve to be points in A2 (E) or P2 (E) where E is an extension
field of F. In that case we will attach the field to the notation for the curve, namely X f (E).
At other times we will define the curve simply by an equation without specifying the field
and thus dropping the field from the notation.
We will need the concept of partial derivatives analogous to the same notion developed
in calculus. If f (x, y) = ai j x i y j ∈ F[x, y], the partial derivative f x of f with respect
to x is
f x (x, y) =
iai j x i−1 y j .
The partial derivative with respect to y is defined analogously; if f (x, y, z) ∈ F[x, y, z],
the partial derivative with respect to z can also be defined. A point (x0 , y0 ) on an affine
curve X f (F) (where f (x0 , y0 ) = 0) is singular if f x (x0 , y0 ) = f y (x0 , y0 ) = 0. A point on
X f (F) is nonsingular or simple if it is not singular. Analogous definitions hold for projective
curves. A curve that has no singular points is called nonsingular, regular, or smooth.
Example 13.3.1 The Fermat curve F m (Fq ) is a projective plane curve over Fq defined
by f (x, y, z) = x m + y m + z m = 0. As f x = mx m−1 , f y = my m−1 , and f z = mz m−1 and
1
Note that when defining polynomials in two or three variables, we will not use subscripts; the variables will be
denoted x and y, or x, y, and z.
527
13.3 Algebraic curves
since (0 : 0 : 0) is not a projective point, the only time that singular points on F m (Fq ) can
exist are when gcd(m, q) = 1; in that case every point on the curve is singular. So if m and
q are relatively prime, F m (Fq ) is nonsingular.
Exercise 773 The Fermat curve F m (Fq ) is defined in Example 13.3.1. Consider the curve
F 3 (Fq ) given by x 3 + y 3 + z 3 = 0.
(a) Find the three projective points (x : y : z) of P2 (F2 ) on F 3 (F2 ).
(b) Find the nine projective points (x : y : z) of P2 (F4 ) on F 3 (F4 ).
(c) Find the nine projective points (x : y : z) of P2 (F8 ) on F 3 (F8 ). The field F8 can be
found in Example 3.4.3.
Example 13.3.2 Let q = r 2 where r is a prime power. The Hermitian curve Hr (Fq ) is
the projective plane curve over Fq defined by f (x, y, z) = x r +1 − y r z − yz r = 0. In Exercise 774, you will show that Hr (Fq ) has only one point at infinity, namely (0 : 1 : 0). We show
that there are r 3 affine points (x : y : 1) of P2 (Fq ) on Hr (Fq ). As z = 1, x r +1 = y r + y. But
y r + y = Tr2 (y), where Tr2 : Fr 2 → Fr is the trace map from Fr 2 to Fr . By Lemma 3.8.5,
Tr2 is a nonzero Fr -linear transformation from Fr 2 to Fr . As its image is a nonzero subspace of the 1-dimensional space Fr , Tr2 is surjective. So its kernel is a 1-dimensional Fr subspace of Fr 2 , implying there are r elements y ∈ Fr 2 with Tr2 (y) = 0. When Tr2 (y) = 0,
then x r +1 = 0 has one solution x = 0; this leads to r affine points on Hr (Fq ). If x ∈ Fr 2 ,
x r +1 ∈ Fr as r 2 − 1 = (r + 1)(r − 1) and the nonzero elements of Fr in Fr 2 are precisely
those satisfying β r −1 = 1. When y is one of the r 2 − r elements of Fr 2 with Tr2 (y) = 0,
there are r + 1 solutions x ∈ Fr 2 with x r +1 = Tr2 (y). This leads to (r 2 − r )(r + 1) = r 3 − r
more affine points. Hence, Hr (Fq ) has r 3 − r + r = r 3 affine points and a total of r 3 + 1
projective points.
Exercise 774 Let q = r 2 where r is a prime power. The Hermitian curve Hr (Fq ) is defined
in Example 13.3.2.
(a) Show that Hr (Fq ) is nonsingular. Hint: In Fq , r = 0 as r is a multiple of the characteristic
of Fq .
(b) Show that (0 : 1 : 0) is the only point at infinity on Hr (Fq ).
(c) Find the eight affine points (x : y : 1) of P2 (F4 ) on H2 (F4 ).
(d) Find the 64 affine points (x : y : 1) of P2 (F16 ) on H4 (F16 ). Table 5.1 gives the field
F16 .
Exercise 775 The Klein quartic K4 (Fq ) is the projective curve over Fq defined by the
fourth degree homogeneous polynomial equation f (x, y, z) = x 3 y + y 3 z + z 3 x = 0.
(a) Find the three partial derivatives of f .
(b) Show that if Fq has characteristic 3, K4 (Fq ) is nonsingular.
(c) If (x : y : z) is a singular point of K4 (Fq ), show that x 3 y = −3y 3 z using f y (x, y, z) = 0,
and that z 3 x = 9y 3 z using f x (x, y, z) = 0 and f y (x, y, z) = 0.
(d) If (x : y : z) is a singular point of K4 (Fq ), show that 7y 3 z = 0 using x 3 y + y 3 z + z 3 x =
0 and part (c).
(e) Using part (d), show that if Fq does not have characteristic 7, then K4 (Fq ) is nonsingular.
(f ) Find the three projective points in P2 (F2 ) on K4 (F2 ).
(g) Find the five projective points in P2 (F4 ) on K4 (F4 ).
528
Codes from algebraic geometry
(h) Find the 24 projective points in P2 (F8 ) on K4 (F8 ). The field F8 can be found in
Example 3.4.3.
When examining the points on a curve, we will need to know their degrees. The degree
m of a point depends on the field under consideration. Let q = pr where p is a prime, and
let m ≥ 1 be an integer. The map σq : Fq m → Fq m given by σq (α) = α q is an automorphism
of Fq m that fixes Fq ; σq = σ pr where σ p is the Frobenius map defined in Section 3.6. If
P = (x, y) or P = (x : y : z) have coordinates in Fq m , let σq (P) denote (σq (x), σq (y)) or
(σq (x) : σq (y) : σq (z)), respectively. By Exercise 776, in the projective case, it does not
matter which representative is chosen for a projective point when applying σq to the point.
Suppose f is in either Fq [x, y] or Fq [x, y, z], with f homogeneous in the latter case. So
f defines either an affine or projective curve over Fq or over Fq m . Exercise 776 shows
that if P is a point on X f (Fq m ), so is σq (P). Therefore {σqi (P) | i ≥ 0} is a set of points
on X f (Fq m ); however, there are at most m distinct points in this set as σqm is the identity
automorphism of Fq m . This allows us to extend our notion of a point. A point P on X f (Fq )
of degree m over Fq is a set of m distinct points P = {P0 , . . . , Pm−1 } with Pi on X f (Fq m ),
where Pi = σqi (P0 ); the degree of P over Fq is denoted deg(P). The points of degree one on
X f (Fq ) are called rational or Fq -rational points.2 This definition is motivated by the wellknown situation with polynomials over the real numbers. Conjugation is the automorphism
of the complex numbers that fixes the real numbers. The polynomial p(x) = x 2 − 4x + 5
with real coefficients has no real roots but has two complex conjugate roots, 2 + i and 2 − i.
In our new terminology, the roots of p(x) are lumped together as the pair {2 + i, 2 − i} and
is a point on p(x) = 0 of degree 2 over the real numbers (even though each individual root
is not real). In the definition of a point P = {P0 , . . . , Pm−1 } of degree m over Fq , the Pi s are
required to be distinct. It is possible, beginning with a point P0 in A2 (Fq m ) or P2 (Fq m ), that
Pi = σqi (P0 ) for 0 ≤ i ≤ m − 1 are not distinct. In this case the distinct Pi s form a point of
smaller degree; see Exercise 777(c).
Exercise 776 Do the following:
(a) Let (x : y : z) = (x ′ : y ′ : z ′ ) ∈ P2 (Fq m ). Prove that (σq (x) : σq (y) : σq (z)) = (σq (x ′ ) :
σq (y ′ ) : σq (z ′ )).
(b) Let f be either a polynomial in Fq [x, y] or a homogeneous polynomial in Fq [x, y, z].
Show that if P is in either A2 (Fq m ) or in P2 (Fq m ), respectively, with f (P) = 0, then
f (σq (P)) = 0.
Example 13.3.3 Let f (x, y, z) = x 3 + x z 2 + z 3 + y 2 z + yz 2 ∈ F2 [x, y, z]. The curve determined by this equation is an example of an elliptic curve. A point at infinity on X f (F)
where F is any extension field of F2 satisfies z = 0; hence x 3 = 0, implying y can be any
nonzero value. Thus there is only one point at infinity on X f (F), namely P∞ = (0 : 1 : 0).
The point at infinity is of degree 1 (that is, it is rational) over F2 as its coordinates are in F2 .
2
The objects that we previously called points are actually points of degree 1 over Fq . Notice that points of degree
m over Fq are fixed by σq , just as elements of Fq are; hence, this is why these points are considered to be on
X f (Fq ).
529
13.3 Algebraic curves
When considering the affine points, we can assume z = 1 and so
x 3 + x + 1 = y 2 + y.
(13.9)
What are the affine points of degree 1 over F2 on the curve? Since x and y are in F2 , we see
that y 2 + y = 0 but x 3 + x + 1 = 1. So the only point of degree 1 (that is, the only rational
point) over F2 is the point at infinity. There are points of degree 2 over F2 . Here x and y
are in F4 . If y = 0 or 1, then y 2 + y = 0. By (13.9), x 3 + x + 1 = 0, which has no solution
in F4 . If y = ω or ω, then y 2 + y = 1 and so x 3 + x = x(x + 1)2 = 0 by (13.9). Thus
the points of degree 2 over F2 on the curve are P1 = { p1 , p1′ }, where p1 = (0 : ω : 1) and
p1′ = (0 : ω : 1), and P2 = { p2 , p2′ }, where p2 = (1 : ω : 1) and p2′ = (1 : ω : 1), noting
that σ2 switches the two projective points in each pair.
Exercise 777 Let f (x, y, z) = x 3 + x z 2 + z 3 + y 2 z + yz 2 ∈ F2 [x, y, z] determine an elliptic curve as in Example 13.3.3.
(a) Prove that X f (F) is nonsingular for any extension field F of F2 .
(b) Find the four points of degree 3 on the curve over F2 . The field F8 can be found in
Example 3.4.3.
(c) Find the five points of degree 4 on the curve over F2 . The field F16 can be found in
Table 5.1. (Along the way you may rediscover the points of degree 2.)
Exercise 778 This exercise uses the results of Exercise 773. Find the points on the Fermat
curve defined by x 3 + y 3 + z 3 = 0 of degrees 1, 2, and 3 over F2 .
Exercise 779 This exercise uses the results of Exercise 774.
(a) There are eight affine points (x : y : 1) of P2 (F4 ) on the Hermitian curve H2 (F4 ). Find
the two affine points on this curve of degree 1 over F2 and the three affine points of
degree 2 also over F2 .
(b) There are 64 affine points (x : y : 1) of P2 (F16 ) on the Hermitian curve H4 (F16 ). Find
the two affine points on this curve of degree 1 over F2 , the single affine point of degree
2 over F2 , and the 15 affine points of degree 4 over F2 .
Exercise 780 This exercise uses the results of Exercise 775. Find the points on the Klein
quartic defined by x 3 y + y 3 z + z 3 x = 0 of degrees 1, 2, and 3 over F2 .
When defining algebraic geometry codes, we will need to be able to compute the points
on the intersection of two curves. In addition to the degree of a point of intersection, we
need to know the intersection multiplicity, which we shorten to multiplicity, at the point of
intersection. We do not formally define multiplicity because the definition is too technical.
As the following example illustrates, we can compute multiplicity similarly to the way
multiplicity of zeros is computed for polynomials in one variable.
Example 13.3.4 In Example 13.3.3 and Exercise 777, we found some of the projective
points from P2 (Fq ), where q is a power of 2, on the elliptic curve determined by x 3 + x z 2 +
z 3 + y 2 z + yz 2 = 0. We now explore how this curve intersects other curves.
r Intersection with x = 0. We either have z = 0 or can assume that z = 1. In the former case
we obtain the point at infinity P∞ = (0 : 1 : 0); in the latter, the equation y 2 + y + 1 = 0
must be satisfied leading to the two points p1 = (0 : ω : 1) and p1′ = (0 : ω : 1) in P2 (F4 ).
530
Codes from algebraic geometry
r
r
r
r
We can view this in one of two ways. Over F4 or extension fields of F4 , the curves
x 3 + x z 2 + z 3 + y 2 z + yz 2 = 0 and x = 0 intersect at the three points P∞ , p1 , and p1′ .
Each of these points has degree 1, and the intersection multiplicity at each of these points
is 1. Over F2 or extension fields of F2 not containing F4 , the points p1 and p1′ combine to
form a point of degree 2, and so in this case we have two points of intersection: P∞ and
P1 = { p1 , p1′ }. The point P∞ has degree 1 and P1 has degree 2. Of course, the intersection
multiplicity at each of these points is still 1.
Intersection with x 2 = 0. We can view x 2 = 0 as the union of the line x = 0 with itself.
Therefore every point on the elliptic curve and x 2 = 0 occurs twice as frequently as
it did on the single line x = 0. Thus over F4 or extension fields of F4 there are three
points of intersection, P∞ , p1 , and p1′ ; each point continues to have degree 1 but now
the multiplicity intersection at each point is 2. Similarly, over F2 or extension fields not
containing F4 , there are two points of intersection, P∞ and P1 = { p1 , p1′ }. They have
degree 1 and degree 2, respectively, and the intersection multiplicity at each point is now 2.
Intersection with z = 0. We saw from Example 13.3.3 that there was only one point P∞
on the elliptic curve with z = 0. This point has degree 1 over F2 or any extension field
of F2 . Plugging z = 0 into the equation defining our elliptic curve, we obtain x 3 = 0.
Hence P∞ occurs on the intersection with multiplicity 3.
Intersection with z 2 = 0. As in the case x 2 = 0, we double the multiplicities obtained
in the case z = 0. Thus over any field of characteristic 2, the elliptic curve intersects the
union of two lines each given by z = 0 at P∞ . This point has degree 1 over F2 or any
extension field of F2 . The intersection multiplicity at P∞ doubles and therefore is 6.
Intersection with y = 0. Here z = 0 is not possible as then x would have to be 0. So z = 1
and we must have x 3 + x + 1 = 0. The only solutions occur in F8 and lead to the points
p3 = (α : 0 : 1), p3′ = (α 2 : 0 : 1), and p3′′ = (α 4 : 0 : 1) in P2 (F8 ). (Compare this to your
answer in Exercise 777(b).) Thus over F8 or extension fields of F8 there are three points
of intersection, p3 , p3′ , and p3′′ ; each has degree 1. The intersection multiplicity at each
point is 1. Over F2 or extension fields not containing F8 , the three points combine into
a single point of intersection P3 = { p3 , p3′ , p3′′ } of degree 3. Its intersection multiplicity
remains 1.
Exercise 781 As in Example 13.3.4, find the intersection of the elliptic curve x 3 + x z 2 +
z 3 + y 2 z + yz 2 = 0 with the curves given below over F2 and its extension fields. In addition
give the degree of each point and the intersection multiplicity at each point of intersection.
(a) x + z = 0.
(b) x 2 + z 2 = 0. Hint: x 2 + z 2 = (x + z)2 .
(c) y 2 = 0.
(d) x z = 0. Hint: This is the union of two lines x = 0 and z = 0.
(e) yz = 0.
(f ) x y = 0.
(g) z 2 + y 2 + yz = 0.
Example 13.3.4 illustrates a certain uniformity in the number of points of intersection
if counted properly. To count “properly” one must consider both multiplicity and degree;
namely, an intersection of multiplicity m at a point of degree d counts md times. The total
531
13.3 Algebraic curves
count of the points of intersection is computed by adding the products of all the multiplicities
and degrees. From Example 13.3.4, the elliptic curve intersects x = 0 either in three points
of degree 1 each with multiplicity 1, giving a total count of 1 · 1 + 1 · 1 + 1 · 1 = 3 points
of intersection, or in one point of degree 1 intersecting with multiplicity 1 and one point of
degree 2 intersecting with multiplicity 1, giving a total count of 1 · 1 + 1 · 2 = 3 points of
intersection. We also found that the elliptic curve intersects x 2 = 0 either in three points
of degree 1 with intersection multiplicity 2, giving a total count of 2 · 1 + 2 · 1 + 2 · 1 = 6
points of intersection, or in one point of degree 1 with multiplicity 2 and one point of degree 2
also with multiplicity 2, giving a total count of 2 · 1 + 2 · 2 = 6 points of intersection. Notice
that these counts, 3 and 6, equal the product of the degree 3 of the homogeneous polynomial
defining the elliptic curve and the degree, 1 or 2, of the homogeneous polynomial defining
the curve intersecting the elliptic curve. This illustrates the following theorem due to
Bézout.3
Theorem 13.3.5 (Bézout) Let f (x, y, z) and g(x, y, z) be homogeneous polynomials over
F of degrees d f and dg , respectively. Suppose that f and g have no common nonconstant
polynomial factors. Then X f and X g intersect in d f dg points when counted with multiplicity
and degree.
Exercise 782 Verify that Bézout’s Theorem agrees with the results of Exercise 781.
This discussion naturally leads to the concept of a divisor on a curve. Let X be a curve over
the field F. A divisor D on X over F is a formal sum D = n P P, where n P is an integer
and P is a point of arbitrary degree on X , with only a finite number of the n P being nonzero.
The divisor is effective if n P ≥ 0 for all P. The support supp(D) of the divisor D = n P P
is {P | n P = 0}. The degree of the divisor D = n P P is deg(D) = n P deg(P).
Example 13.3.6 In Example 13.3.3 we found the points of degrees 1 and 2 over F2 on the
elliptic curve x 3 + x z 2 + z 3 + y 2 z + yz 2 = 0. So one divisor on that curve over F2 is D1 =
7P∞ + 4P1 − 9P2 ; this is not effective but D2 = 7P∞ + 4P1 + 9P2 is an effective divisor.
Both D1 and D2 have support {P∞ , P1 , P2 }. Also deg(D1 ) = 7 · 1 + 4 · 2 − 9 · 2 = −3
and deg(D2 ) = 7 · 1 + 4 · 2 + 9 · 2 = 33. Note that 7P∞ + 4 p1 − 9 p2′ is not a divisor over
F2 because p1 and p2′ are not defined over F2 ; we are required to list the entire pair P1
and P2 in a divisor over F2 . However, 7P∞ + 4 p1 − 9 p2′ is a divisor over F4 with support
{P∞ , p1 , p2′ } and degree 7 · 1 + 4 · 1 − 9 · 1 = 2.
We can use the notation of divisors to describe the intersection of two curves easily. If X 1
and X 2 are projective curves, then the intersection divisor over F of X 1 and X 2 , denoted
X 1 ∩ X 2 , is n P P, where the summation runs over all points over F on both X 1 and X 2
and n P is the multiplicity of the point on the two curves. If these curves are defined by
homogeneous polynomials, with no common nonconstant polynomial factors, of degrees
d f and dg , then the intersection divisor has degree d f dg by Bézout’s Theorem.
3
Bézout’s Theorem dates back to 1779 and originated in remarks of Newton and MacLauren. A version of this
theorem for the complex plane had been proved earlier by Euler in 1748 and Cramer in 1750.
532
Codes from algebraic geometry
Example 13.3.7 In Example 13.3.4 we found the intersection of the elliptic curve determined by x 3 + x z 2 + z 3 + y 2 z + yz 2 = 0 with five other curves. Their intersection divisors
are:
r Intersection with x = 0: P + P .
∞
1
r Intersection with x 2 = 0: 2P + 2P .
∞
1
r Intersection with z = 0: 3P .
∞
r Intersection with z 2 = 0: 6P .
∞
r Intersection with y = 0: P .
3
Exercise 783 Find the intersection divisor of the elliptic curve in Exercise 781 with each
of the other curves in that exercise.
We close this section with one final concept. In computing the minimum distance and
dimension of algebraic geometry codes, the genus of a plane projective curve will play an
important role. The genus of a curve is actually connected to a topological concept of the
same name. Without giving this topological definition, we state a theorem, called Plücker’s
Formula4 (see [135]), which gives the genus of a nonsingular projective plane curve; for
our purposes, this can serve as the definition of genus.
Theorem 13.3.8 (Plücker’s Formula) The genus g of a nonsingular projective plane curve
determined by a homogeneous polynomial of degree d ≥ 1 is
g=
(d − 1)(d − 2)
.
2
Exercise 784 Find the genus of the following curves which have already been shown to
be nonsingular.
(a) The Fermat curve of Example 13.3.1.
(b) The Hermitian curve of Example 13.3.2.
(c) The Klein quartic of Exercise 775.
(d) The elliptic curve of Example 13.3.3.
Let p(x) ∈ F[x] be a polynomial of degree 3 that has no repeated roots. Show that the
following curves are nonsingular and compute their genus.
(e) The curve X f H (F) where f (x, y) = y 2 − p(x) and F does not have characteristic 2.
(f ) The curve X f H (F) where f (x, y) = y 2 + y + p(x) and F has characteristic 2.
13.4
Algebraic geometry codes
We are about ready to define the codes studied by Goppa that generalize the classical codes
from Section 13.2. Recall that these codes are n-tuples consisting of functions evaluated at
n fixed points where the functions run through a certain vector space. To define the more
general codes we need to determine the n points, which will turn out to be points on a curve,
4
This formula is actually a special case of a series of more general formulas developed by Plücker in 1834 and
generalized by Max Noether in 1875 and 1883.
533
13.4 Algebraic geometry codes
and the vector space of functions, which will actually be a vector space of equivalence
classes of rational functions related to the function defining the curve.
We first concentrate on this latter vector space of functions. Let p(x, y, z) be a homogeneous polynomial of positive degree that defines a projective curve X over a field F.
Define the field of rational functions on X over F by
1
4
g g,h ∈ F[x, y, z] homogeneous
∪ {0}
≈X .
F(X ) =
h of equal degree with p ∤ h
This notation means that F(X ) is actually a collection of equivalence classes of rational
functions where the numerator and denominator are homogeneous of equal degree. By
requiring that the denominator not be a multiple of p(x, y, z), at least one of the points on
X is not a zero of the denominator. To define the equivalence we say that g/ h ≈X g ′ / h ′ if
and only if gh ′ − g ′ h is a polynomial multiple of p(x, y, z). Furthermore, we say g/ h ≈X 0
precisely when g(x, y, z) is a polynomial multiple of p(x, y, z). By Exercise 785, F(X ) is
actually a field containing F as a subfield.
Exercise 785 Verify that F(X ) is a field containing F as a subfield (where F is identified
with the constant polynomials).
Let f = g/ h ∈ F(X ) with f ≈X 0. Then the divisor of f is
div( f ) = (X ∩ X g ) − (X ∩ X h ),
(13.10)
that is, the difference in the intersection divisors X ∩ X g and X ∩ X h . Essentially, div( f )
is a mechanism to keep track easily of the zeros and poles of f that are on X together with
their multiplicities and orders, respectively; in effect, div( f ) is the “zeros of f on X ” minus
the “poles of f on X .” There will be cancellation in (13.10) whenever a point on X is both
a zero and a pole; ultimately, if P appears with positive coefficient in div( f ), it is a zero
of f and a pole if it appears with a negative coefficient. If g and h have degrees dg and
dh , respectively, then by Bézout’s Theorem, deg(div( f )) = d p dg − d p dh = 0 as dg = dh .
Since f is only a representative of an equivalence class, we need to know that div( f ) is
independent of the representative chosen and therefore is well-defined. This is indeed true;
we do not prove it but illustrate it in Exercise 786.
Exercise 786 Let X be the elliptic curve defined by x 3 + x z 2 + z 3 + y 2 z + yz 2 = 0
over a field of characteristic 2. Let f = g/ h and f ′ = g ′ / h ′ , where g(x, y, z) = x 2 + z 2 ,
h(x, y, z) = z 2 , g ′ (x, y, z) = z 2 + y 2 + yz, and h ′ (x, y, z) = x z. As in Example 13.3.3, let
P∞ = (0 : 1 : 0) and P2 = {(1 : ω : 1), (1 : ω : 1)}. Example 13.3.7 and Exercise 783 will
help with parts (b) and (c).
(a) Show that f ≈X f ′ .
(b) Show that div( f ) = 2P2 − 4P∞ using (13.10).
(c) Show that div( f ′ ) = 2P2 − 4P∞ using (13.10) with f ′ , g ′ , and h ′ in place of f , g,
and h.
There is a partial ordering of divisors on a curve given by D = n P P / D ′ = n ′P P
provided n P ≥ n ′P for all P. Thus D is effective if D / 0. We now define the space of
534
Codes from algebraic geometry
functions that will help determine our algebraic geometry codes. Let D be a divisor on a
projective curve X defined over F. Let
L(D) = { f ∈ F(X ) | f ≈X 0, div( f ) + D / 0} ∪ {0}.
In other words for f ≈X 0, f ∈ L(D) if and only if div( f ) + D is effective. This is a vector
space over F as seen in Exercise 787. Suppose D = n P P. A rational function f is in
L(D) provided that any pole P of f has order not exceeding n P and provided that any zero
P of f has multiplicity at least −n P .
Exercise 787 Prove that L(D) is a vector space over F.
The next theorem gives some simple facts about L(D).
Theorem 13.4.1 Let D be a divisor on a projective curve X . The following hold:
(i) If deg(D) < 0, then L(D) = {0}.
(ii) The constant functions are in L(D) if and only if D / 0.
(iii) If P is a point on X with P ∈ supp(D), then P is not a pole of any f ∈ L(D).
Proof: For (i), if f ∈ L(D) with f ≈X 0, then div( f ) + D / 0. In particular, deg(div( f ) +
D) ≥ 0; but deg(div( f ) + D) = deg(div( f )) + deg(D) = deg(D), contradicting deg(D) <
0.
Let f ≈X 0 be a constant function. If f ∈ L(D) then div( f ) + D / 0. But div( f ) = 0
when f is a constant function. So D / 0. Conversely, if D / 0, then div( f ) + D / 0
provided div( f ) = 0, which is the case if f is a constant function. This proves (ii).
For (iii), if P is a pole of f ∈ L(D) with P ∈ supp(D), then the coefficient of P in
the divisor div( f ) + D of X is negative, contradicting the requirement that div( f ) + D is
effective for f ∈ L(D).
With one more bit of notation we will finally be able to define algebraic geometry
codes. Recall that the projective plane curve X is defined by p(x, y, z) = 0 where we now
assume that the field F is Fq . Let D be a divisor on X ; choose a set P = {P1 , . . . , Pn } of n
distinct Fq -rational points on X such that P ∩ supp(D) = ∅. Order the points in P and let
evP : L(D) → Fqn be the evaluation map defined by
evP ( f ) = ( f (P1 ), f (P2 ), . . . , f (Pn )).
In this definition, we need to be careful. We must make sure evP is well-defined, as the
rational functions are actually representatives of equivalence classes. If f ∈ L(D), then Pi
is not a pole of f by Theorem 13.4.1(iii). However, if f is represented by g/ h, h may
still have a zero at Pi occurring a certain number of times in X h ∩ X . Since Pi is not a
pole of f , Pi occurs at least this many times in X g ∩ X . If we choose g/ h to represent
f , f (Pi ) is really 0/0, a situation we must avoid. It can be shown that for any f ∈ L(D),
we can choose a representative g/ h where h(Pi ) = 0; we will always make such a choice.
Suppose now that f has two such representatives g/ h ≈X g ′ / h ′ where h(Pi ) = 0 and
h ′ (Pi ) = 0. Then gh ′ − g ′ h is a polynomial multiple of p and p(Pi ) = 0, implying that
g(Pi )h ′ (Pi ) = g ′ (Pi )h(Pi ). Since h(Pi ) and h ′ (Pi ) are nonzero, g(Pi )/ h(Pi ) =
535
13.4 Algebraic geometry codes
g ′ (Pi )/ h ′ (Pi ). Thus evP is well-defined on L(D). Also f (Pi ) ∈ Fq for any f ∈ L(D) as the
coefficients of the rational function and the coordinates of Pi are in Fq . Hence the image
evP ( f ) of f is indeed contained in Fqn .
Exercise 788 Prove that the map evP is linear.
With this notation the algebraic geometry code associated to X , P, and D is defined to
be
C(X , P, D) = {evP ( f ) | f ∈ L(D)}.
Exercise 788 implies that C(X , P, D) is a linear code over Fq . By Theorem 13.4.1(i), we will
only be interested in codes where deg(D) ≥ 0 as otherwise L(D) = {0} and C(X , P, D) is
the zero code. We would like some information about the dimension and minimum distance
of algebraic geometry codes. An upper bound on the dimension of the code is the dimension
of L(D). In the case that X is nonsingular, a major tool used to find the dimension of L(D)
is the Riemann–Roch Theorem, one version of which is the following.
Theorem 13.4.2 (Riemann–Roch) Let D be a divisor on a nonsingular projective plane
curve X over Fq of genus g. Then
dim(L(D)) ≥ deg(D) + 1 − g.
Furthermore, if deg(D) > 2g − 2, then
dim(L(D)) = deg(D) + 1 − g.
The next theorem gives conditions under which we know the dimension of algebraic
geometry codes exactly and have a lower bound on their minimum distance. The reader is
encouraged to compare its proof to that of Theorem 5.3.1.
Theorem 13.4.3 Let D be a divisor on a nonsingular projective plane curve X over Fq of
genus g. Let P be a set of n distinct Fq -rational points on X such that P ∩ supp(D) = ∅.
Furthermore, assume 2g − 2 < deg(D) < n. Then C(X , P, D) is an [n, k, d] code over Fq
where k = deg(D) + 1 − g and d ≥ n − deg(D). If { f 1 , . . . , f k } is a basis of L(D), then
f 1 (P1 ) f 1 (P2 ) · · · f 1 (Pn )
f 2 (P1 ) f 2 (P2 ) · · · f 2 (Pn )
..
.
f k (P1 )
f k (P2 )
···
f k (Pn )
is a generator matrix for C(X , P, D).
Proof: By the Riemann–Roch Theorem, dim(L(D)) = deg(D) + 1 − g as 2g − 2 <
deg(D). Hence this value is also the dimension k of C(X , P, D) provided we show that the
linear map evP (see Exercise 788) has trivial kernel. Let P = {P1 , . . . , Pn }. Suppose that
evP ( f ) = 0. Thus f (Pi ) = 0 for all i, implying that Pi is a zero of f and the coefficient of
Pi in div( f ) is at least 1. But Pi ∈ supp(D), and therefore div( f ) + D − P1 − · · · − Pn /
0, implying that f ∈ L(D − P1 − · · · − Pn ). However, deg(D) < n, which means that
536
Codes from algebraic geometry
deg(D − P1 − · · · − Pn ) < 0. By Theorem 13.4.1(i), L(D − P1 − · · · − Pn ) = {0}, showing that f = 0 and hence that evP has trivial kernel. We conclude that k = deg(D) + 1 − g.
This also shows that the matrix given in the statement of the theorem is a generator matrix
for C(X , P, D).
Now suppose that evP ( f ) has minimum nonzero weight d. Thus f (Pi j ) = 0 for some set
of n − d distinct indices {i j | 1 ≤ j ≤ n − d}. As above, f ∈ L(D − Pi1 − · · · − Pin−d ).
Since f = 0, Theorem 13.4.1(i) shows that deg(D − Pi1 − · · · − Pin−d ) ≥ 0. Therefore,
deg(D) − (n − d) ≥ 0 or d ≥ n − deg(D).
Example 13.4.4 We show that the narrow-sense and the extended narrow-sense Reed–
Solomon codes are in fact algebraic geometry codes. Consider the projective plane curve X
over Fq given by z = 0. The points on the curve are (x : y : 0), which essentially forms the
projective line. Let P∞ = (1 : 0 : 0). There are q remaining Fq -rational points on the line.
Let P0 be the Fq -rational point represented by (0 : 1 : 0). Let P1 , . . . , Pq−1 be the remaining
Fq -rational points. For the narrow-sense Reed–Solomon codes we will let n = q − 1 and
P = {P1 , . . . , Pq−1 }. For the extended narrow-sense Reed–Solomon codes we will let n = q
and P = {P0 , . . . , Pq−1 }. Fix k with 1 ≤ k ≤ n and let D = (k − 1)P∞ ; note that D = 0
when k = 1. Clearly, P ∩ supp(D) = ∅, X is nonsingular of genus g = 0 by Plücker’s
Formula, and deg(D) = k − 1. In particular, deg(D) > 2g − 2 and, by the Riemann–Roch
Theorem, dim(L(D)) = deg(D) + 1 − g = k.
We claim that
1
x k−1
x x2
B = 1, , 2 , . . . , k−1
y y
y
is a basis of L(D). First, div(x j /y j ) = j P0 − j P∞ . Thus div(x j /y j ) + D = j P0 +
(k − 1 − j)P∞ , which is effective provided 0 ≤ j ≤ k − 1. Hence, every function in B
is in L(D). We only need to show that B is independent over Fq . Suppose that
k−1
f =
aj
j=0
xj
≈X 0.
yj
j k−1− j
Then f = g/ h, where g(x, y, z) = k−1
and h(x, y, z) = y k−1 . By definition
j=0 a j x y
of ≈X , g(x, y, z) is a polynomial multiple of z. Clearly, this polynomial multiple must be
0 as z does not occur in g(x, y, z), which shows that a j = 0 for 0 ≤ j ≤ k − 1. Thus B is
a basis of L(D).
Using this basis we see that every nonzero polynomial in L(D) can be written as
f (x, y, z) = g(x, y, z)/y d , where g(x, y, z) = dj=0 c j x j y d− j with cd = 0 and d ≤ k − 1.
But g(x, y, z) is the homogenization (in Fq [x, y]) of m(x) = dj=0 c j x j . So the association
f (x, y, z) ↔ m(x) is a one-to-one correspondence between L(D) and the polynomials in
Fq [x] of degree k − 1 or less, together with the zero polynomial, which we previously denoted P k . Furthermore, if β ∈ Fq , then m(β) = f (β, 1, 0); additionally, Theorem 13.1.3(i)
implies that f (β, 1, 0) = f (x0 , y0 , z 0 ), where (x0 : y0 : z 0 ) is any representation of
537
13.4 Algebraic geometry codes
(β : 1 : 0) because f is the ratio of two homogeneous polynomials of equal degree d.5 Let
α be a primitive element of Fq . Order the points so that Pi is represented by (α i−1 : 1 : 0)
for 1 ≤ i ≤ q − 1; then order P as P1 , . . . , Pq−1 if n = q − 1 and as P1 , . . . , Pq−1 , P0 if
n = q. This discussion shows that
{(m(1), m(α), m(α 2 ), . . . , m(α q−2 )) | m ∈ P k }
= {( f (P1 ), f (P2 ), f (P3 ), . . . , f (Pq−1 )) | f ∈ L(D)}
and
{(m(1), m(α), . . . , m(α q−2 ), m(0)) | m ∈ P k }
= {( f (P1 ), f (P2 ), . . . , f (Pq−1 ), f (P0 )) | f ∈ L(D)}.
Hence from our presentation in Section 13.2.1, the narrow-sense and extended narrow-sense
Reed–Solomon codes are algebraic geometry codes.
Theorem 13.4.3 shows that both the narrow-sense and extended narrow-sense Reed–
Solomon codes have dimension deg(D) + 1 − g = k − 1 + 1 − 0 = k and minimum distance d ≥ n − deg(D) = n − k + 1. The Singleton Bound shows that d ≤ n − k + 1 and
hence d = n − k + 1; each code is MDS, as we already knew.
Exercise 789 Let X be a projective curve of genus 0. Let D be a divisor on X such
that −2 < deg(D) < n. Prove that C(X , P, D) is MDS.
Exercise 790 In Example 13.4.4 we show that Reed–Solomon codes are algebraic geometry codes. Here you will show that generalized Reed–Solomon codes are also algebraic
geometry codes, as shown for example in [320].
(a) Let γ0 , γ1 , . . . , γn−1 be n distinct elements of Fq . Let v0 , v1 , . . . , vn−1 be n not necessarily distinct elements of Fq . Prove the Lagrange Interpolation Formula, which states
that there exists a polynomial p(x) ∈ Fq [x] of degree at most n − 1 such that p(γi ) = vi
given by
n−1
p(x) =
vi
i=0
x − γj
.
γ − γj
j=i i
(13.11)
For the remainder of the problem we assume the notation of part (a), with the additional
assumption that the vi s are nonzero and v = (v0 , . . . , vn−1 ). Also, let γ = (γ0 , . . . , γn−1 ).
Let X be the projective plane curve over Fq given by z = 0. Let h(x, y) be the homogenization in Fq [x, y] of the polynomial p(x) of degree d ≤ n − 1 given by (13.11); note
that h = 0 as vi = 0. Let u(x, y, z) = h(x, y)/y d , which is an element of Fq (X ). Let
P = {P1 , P2 , . . . , Pn } where Pi is a representative of the projective point (γi−1 : 1 : 0)
in P2 (Fq ). Let P∞ = (1 : 0 : 0). Finally, for k an integer with 1 ≤ k ≤ n, let D = (k − 1) ×
P∞ − div(u).
(b) Prove that u(Pi ) = vi−1 for 1 ≤ i ≤ n.
5
Recall that in our formulation (13.2) of GRS codes, the exact code depended on the representation of projective
points chosen. The formulation of the narrow-sense and extended narrow-sense Reed–Solomon codes presented
here has the nice feature that the exact code is independent of the representation chosen for the projective points.
538
Codes from algebraic geometry
(c) Prove that P ∩ supp(D) = ∅.
(d) Prove that deg(D) = k − 1. Hint: Recall that the degree of the divisor of any element
in Fq (X ) is 0.
(e) Prove that dim(L(D)) = k and that
1
x k−1
x x2
B = u, u , u 2 , . . . , u k−1
y y
y
is a basis of L(D).
(f ) Prove that GRSk (γ, v) = C(X , P, D).
Exercise 791 Let X be the elliptic curve over F2 defined by x 3 + x z 2 + z 3 + y 2 z + yz 2 =
0. By Exercise 777, X is nonsingular. The genus of X is 1. Let P∞ = (0 : 1 : 0), which
is the unique point at infinity on X by Example 13.3.3. Let D = k P∞ for some positive
integer k.
(a) Prove that dim(L(D)) = k.
(b) Compute div(x i y j /z i+ j ) where i and j are nonnegative integers (including the possibility that i = j = 0). Hint: Use Example 13.3.7.
(c) What condition must be satisfied for x i y j /z i+ j to be in L(D)?
(d) Find a basis of L(D) for k ≥ 1 using the functions in part (c). Hint: Compute this first
for k = 1, 2, 3, 4, 5 and then note that
y2
x3
x
y
+
1
+
≈
+ .
X
z3
z
z2
z
(e) Let P be the four affine points on X in P2 (F4 ) (see Example 13.3.3 where these points
are p1 , p1′ , p2 , p2′ ). For each of the codes C(X , P, D) over F4 with 1 ≤ k ≤ 3, apply
Theorem 13.4.3 to find the dimension of the code, a lower bound on its minimum
distance, and a generator matrix. Also give an upper bound on the minimum distance
from the Singleton Bound.
(f ) Let P be the 12 affine points on X in P2 (F8 ) (see Exercise 777(b) where these points
were found in the process of computing the points of degree 3 over F2 on the curve).
For each of the codes C(X , P, D) over F8 with 1 ≤ k ≤ 11, apply Theorem 13.4.3
to find the dimension of the code, a lower bound on its minimum distance, and a
generator matrix. Also give an upper bound on the minimum distance from the Singleton
Bound.
Exercise 791 illustrates some subtle concepts about the fields involved. The computation
of L(D) involves divisors over some ground field Fq (F2 in the exercise). The divisor D
and the divisors of the functions in L(D) involve points of various degrees over Fq , that is,
sets of points whose coordinates are in possibly many different extension fields of Fq (F2 ,
F4 , and F8 in the exercise). When constructing an algebraic geometry code, an extension
field Fq m of Fq is fixed, possibly Fq itself, and it is over the field Fq m (F4 in part (e) of the
exercise and F8 in part (f )) that the code is defined. All the points in P must be rational
over Fq m .
Exercise 792 Let X be the Hermitian curve H2 (F4 ) over F4 of genus 1 defined by x 3 +
y 2 z + yz 2 = 0. By Exercise 774, X is nonsingular. In the same exercise the point at infinity
539
13.4 Algebraic geometry codes
P∞ = (0 : 1 : 0) and the eight affine points on X in P2 (F4 ) are found; two of these affine
points are P0 = (0 : 0 : 1) and P1 = (0 : 1 : 1). Let D = k P∞ for some positive integer k.
(a) Prove that dim(L(D)) = k.
(b) Show that the intersection divisor of the Hermitian curve with the curve defined by
x = 0 is P∞ + P0 + P1 .
(c) Show that the intersection divisor of the Hermitian curve with the curve defined by
y = 0 is 3P0 .
(d) Find the intersection divisor of the Hermitian curve with the curve defined by z = 0.
(e) Compute div(x i y j /z i+ j ) where i and j are nonnegative integers (including the possibility that i = j = 0).
(f ) Show that x i y j /z i+ j is in L(D) if and only if 2i + 3 j ≤ k.
(g) Find a basis of L(D) for k ≥ 1 using the functions in part (f ). Hint: Show that you can
assume 0 ≤ i ≤ 2 since
x3
y2
y
≈
+ .
X
3
2
z
z
z
(h) Let P be the eight affine points on X in P2 (F4 ) (see Exercise 774). For the code
C(X , P, D) over F4 with 1 ≤ k ≤ 7, apply Theorem 13.4.3 to find the dimension of
the code, a lower bound on its minimum distance, and a generator matrix. Also give an
upper bound on the minimum distance from the Singleton Bound.
Exercise 793 Let X be the Hermitian curve H2 (F16 ) over F16 of genus 6 defined by
x 5 + y 4 z + yz 4 = 0. By Exercise 774, X is nonsingular. In the same exercise the point at
infinity P∞ = (0 : 1 : 0) and the 64 affine points on X in P2 (F16 ) are found; two of these
affine points are P0 = (0 : 0 : 1) and P1 = (0 : 1 : 1). Let D = k P∞ for k ≥ 11.
(a) Prove that dim(L(D)) = k − 5.
(b) Show that the intersection divisor of the Hermitian curve with the curve defined by
x = 0 is P∞ + P0 + 3P1 .
(c) Show that the intersection divisor of the Hermitian curve with the curve defined by
y = 0 is 5P0 .
(d) Find the intersection divisor of the Hermitian curve with the curve defined by z = 0.
(e) Compute div(x i y j /z i+ j ) where i and j are nonnegative integers (including the possibility that i = j = 0).
(f ) Show that x i y j /z i+ j is in L(D) if and only if 4i + 5 j ≤ k.
(g) Find a basis of L(D) for k ≥ 11 using the functions in part (f ). Hint: Show that you can
assume 0 ≤ i ≤ 4 since
x5
y4
y
≈
+ .
X
z5
z4
z
(h) Let P be the 64 affine points on X in P2 (F16 ) (see Exercise 774). For the code C(X , P, D)
over F16 with 11 ≤ k ≤ 63, apply Theorem 13.4.3 to find the dimension of the code
and a lower bound on its minimum distance. Also give an upper bound on the minimum
distance from the Singleton Bound.
Exercise 794 Let X be the Fermat curve over F2 of genus 1 defined by x 3 + y 3 + z 3 = 0.
540
Codes from algebraic geometry
By Example 13.3.1, X is nonsingular. In Exercises 773 and 778, the points of degrees 1,
′
2, and 3 over F2 on X are found. Let P∞ = (1 : 1 : 0) and P∞
= {(ω : 1 : 0), (ω : 1 : 0)}
′
) for some positive
be the points at infinity on X (see Exercise 773). Let D = k(P∞ + P∞
integer k.
(a) Prove that dim(L(D)) = 3k.
(b) Find the intersection divisor of the Fermat curve with the curve defined by x = 0.
(c) Find the intersection divisor of the Fermat curve with the curve defined by y = 0.
(d) Find the intersection divisor of the Fermat curve with the curve defined by z = 0.
(e) Compute div(x i y j /z i+ j ) where i and j are nonnegative integers (including the possibility that i = j = 0).
(f ) What condition must be satisfied for x i y j /z i+ j to be in L(D)?
(g) Find a basis of L(D) for k ≥ 1 using the functions in part (f). Hint: Show that you can
assume 0 ≤ i ≤ 2 since
x3
y3
≈X 3 + 1.
3
z
z
(h) Let P be the six affine points on X in P2 (F4 ) (see Exercise 773). For the code C(X , P, D)
over F4 with k = 1, apply Theorem 13.4.3 to find the dimension of the code, a lower
bound on its minimum distance, and a generator matrix. Also give an upper bound on
the minimum distance from the Singleton Bound.
(i) Let P be the eight affine points on X in P2 (F8 ) (see Exercise 773). For the two codes
C(X , P, D) over F8 with 1 ≤ k ≤ 2, apply Theorem 13.4.3 to find the dimension of
the code, a lower bound on its minimum distance, and a generator matrix. Also give an
upper bound on the minimum distance from the Singleton Bound.
Exercise 795 Let X be the Klein quartic over F2 of genus 3 defined by x 3 y + y 3 z + z 3 x =
0. By Exercise 775, X is nonsingular. In Exercises 775 and 780, the points on X of degrees
′
= (1 : 0 : 0) be the points at infinity on
1, 2, and 3 are found. Let P∞ = (0 : 1 : 0) and P∞
X , and let P0 = (0 : 0 : 1) be the remaining F2 -rational point on X (see Exercise 775). Let
D = k P∞ for some integer k ≥ 5.
(a) Prove that dim(L(D)) = k − 2.
(b) Show that the intersection divisor of the Klein quartic with the curve defined by x = 0
is 3P0 + P∞ .
(c) Show that the intersection divisor of the Klein quartic with the curve defined by y = 0
′
.
is P0 + 3P∞
(d) Find the intersection divisor of the Klein quartic with the curve defined by z = 0.
(e) Compute div(x i y j /z i+ j ) where i and j are nonnegative integers (including the possibility that i = j = 0).
(f ) Show that x i y j /z i+ j is in L(D) if and only if 2i + 3 j ≤ k and i ≤ 2 j.
(g) Find a basis of L(D) for k ≥ 5 using the functions in part (f ). Hint: Compute this first
for k = 5, 6, . . . , 12 and then show that you can assume 0 ≤ i ≤ 2 since
y3
x
x3y
≈
+ .
X
z4
z3
z
(h) Let P be the 22 affine points on X in P2 (F8 ) (see Exercise 775). For each of the codes
541
13.5 The Gilbert–Varshamov Bound revisited
C(X , P, D) over F8 with 5 ≤ k ≤ 21, apply Theorem 13.4.3 to find the dimension of
the code and a lower bound on its minimum distance. Give an upper bound on the
minimum distance from the Singleton Bound. Finally, find a generator matrix for the
code when k = 21.
Exercise 796 Prove that if D / 0, then C(X , P, D) contains the all-one codeword.
Recall that the dual of a generalized Reed–Solomon code is also a generalized Reed–
Solomon code by Theorem 5.3.3. Since generalized Reed–Solomon codes are algebraic
geometry codes by Exercise 790, it is natural to ask if duals of algebraic geometry codes are
also algebraic geometry codes. The answer is yes. It can be shown (see [135, Theorem 2.72]
or [320, Theorem II.2.10]) that C(X , P, D)⊥ = C(X , P, E), where E = P1 + · · · + Pn −
D + (η) for a certain Weil differential η determined by P; this differential is related to
residues of functions mentioned at the conclusion of Section 13.2.2 and in Exercise 771.
13.5
The Gilbert–Varshamov Bound revisited
In Section 2.10 we derived upper and lower bounds on the largest possible rate
αq (δ) = lim sup n −1 logq Aq (n, δn)
n→∞
for a family of codes over Fq of lengths going to infinity with relative distances approaching
δ. The Asymptotic Gilbert–Varshamov Bound is the lower bound αq (δ) ≥ 1 − Hq (δ) guaranteeing that there exists a family of codes with relative distances approaching δ and rates
approaching or exceeding 1 − Hq (δ), where Hq is the Hilbert entropy function defined on
0 ≤ x ≤ r = 1 − q −1 by
0
if x = 0,
Hq (x) =
x logq (q − 1) − x logq x − (1 − x) logq (1 − x) if 0 < x ≤ r ,
as we first described in Section 2.10.3. For 30 years after the publication of the Gilbert–
Varshamov Bound, no family of codes had been demonstrated to exceed this bound until
such a family of algebraic geometry codes was shown to exist in 1982. In this section we
discuss this result; but before doing so we show that there is a family of Goppa codes that
meet this bound.
13.5.1 Goppa codes meet the Gilbert–Varshamov Bound
Theorem 5.1.10 states that primitive BCH codes over Fq are asymptotically bad, where we
recall that a set of codes of lengths going to infinity is asymptotically bad if either the rates go
to 0 or the relative distances go to 0. This is true of other families of codes such as the family
of cyclic codes whose extended codes are affine-invariant [161]. The family of binary codes
obtained from Reed–Solomon codes over F2m by concatenation with the [m, m, 1] binary
code consisting of all binary m-tuples is also asymptotically bad [218, Chapter 10]. Recall
that a set of codes of lengths approaching infinity is asymptotically good if both the rates
and relative distances are bounded away from 0. Certainly, codes that meet the Asymptotic
542
Codes from algebraic geometry
Gilbert–Varshamov Bound are asymptotically good. For most families of codes, such as
cyclic codes, it is not known if the family is asymptotically bad or if it contains a subset
of asymptotically good codes. The first explicit construction of asymptotically good codes
was due to Justesen [160] in 1972; his codes are a modification of the concatenated Reed–
Solomon codes mentioned above. While the Justesen codes are asymptotically good, they
do not meet the Asymptotic Gilbert–Varshamov Bound. However, Goppa codes include
a set of codes that meet the Asymptotic Gilbert–Varshamov Bound. We prove this here.
However, the reader will note that the construction of this set of Goppa codes is not explicit;
that is, one cannot write down the family of specific codes that meet the bound.
In our construction we will need a lower bound on the number Iq t (e) of irreducible monic
polynomials of degree e in Fq t [x]. The exact value of Iq t (e), computed in [18], is
Iq t (e) =
1
e
µ(d)q te/d ,
d|e
where µ is the Möbius function given by µ(d) = 1 if d = 1, µ(d) = (−1)r if d is a product
of r distinct primes, and µ(d) = 0 otherwise. The lower bound we will need is
Iq t (e) ≥
q te
1 − q −te/2+1 ;
e
(13.12)
the proof of this can also be found in [18].
Theorem 13.5.1 There is a family of Goppa codes over Fq that meets the Asymptotic
Gilbert–Varshamov Bound.
Proof: Let t and d be positive integers. Consider the Goppa code Ŵ(L , G) over Fq of length
n = q t with L = {γ0 , γ1 , . . . , γn−1 } = Fq t and G ∈ Fq t [x]. The polynomial G must not have
roots in L, a requirement that is certainly satisfied if we choose G to be irreducible of degree
e > 1 over Fq t . We need to decide what condition must be satisfied for the existence of such
a G so that Ŵ(L , G) has minimum distance at least d. To that end, let c = c0 c1 · · · cn−1 ∈ Fqn
n−1
ci /(x − γi ) = a(x)/b(x) where b(x) is a
where 0 < wt(c) = w. If c ∈ Ŵ(L , G), then i=0
product of w of the factors (x − γi ) and deg(a(x)) ≤ w − 1. Furthermore, a(x) is a multiple
of G(x); in particular, G(x) must be one of the irreducible factors of a(x) of degree e. There
are at most ⌊(w − 1)/e⌋ such factors. So if we want to make sure that c ∈ Ŵ(L , G), which
we do when w < d, we must eliminate at most ⌊(w − 1)/e⌋ choices for G from among
all the Iq t (e) irreducible polynomials of degree e in Fq t [x]. For each 1 ≤ w < d there are
( wn )(q − 1)w vectors c that are not to be in Ŵ(L , G). Thus there are at most
d−1
w=1
w−1
e
n
d
(q − 1)w < Vq (n, d)
e
w
d n
( i )(q − 1)i .
irreducible polynomials that cannot be our choice for G where Vq (n, d) = i=0
Hence, we can find an irreducible G so that Ŵ(L , G) has minimum distance at least d
provided
d
q te
Vq (n, d) <
1 − q −te/2+1 ,
e
e
(13.13)
543
13.5 The Gilbert–Varshamov Bound revisited
by (13.12). If δ = d/n, taking logarithms base q and dividing by n, we obtain
,
te
+ n −1 logq 1 − q −te/2+1 .
n −1 [logq (δn) + logq Vq (n, δn)] <
n
Taking limits as n approaches infinity, Lemma 2.10.3 yields Hq (δ) ≤ limn→∞ te/n or 1 −
Hq (δ) ≥ 1 − limn→∞ te/n. Since t = logq n, we can choose an increasing sequence of es
growing fast enough so that both the inequality in (13.13) is maintained (guaranteeing the
existence of a sequence of Goppa codes of increasing lengths n = q t with relative minimum
distances at least δn) and 1 − Hq (δ) = 1 − limn→∞ te/n. Theorem 13.2.1 implies that
the codes in our sequence have rate at least 1 − te/n; therefore this sequence meets the
Asymptotic Gilbert–Varshamov Bound.
13.5.2 Algebraic geometry codes exceed the Gilbert–Varshamov Bound
The 1982 result of Tsfasman, Vlăduţ, and Zink [333] showed for the first time that there
exists a sequence of codes whose relative distances approach δ with rates exceeding 1 −
Hq (δ) as their lengths go to infinity. We will only outline this result as the mathematics
involved is beyond the scope of this book.
Let X be a curve of genus g over Fq , P a set of size n consisting of Fq -rational points
on X , and D a divisor on X with P ∩ supp(D) = ∅ where 2g − 2 < deg(D) < n. By
Theorem 13.4.3, C(X , P, D) is an [n, k, d] code over Fq with rate R = k/n = (deg(D) +
1 − g)/n and relative distance d/n ≥ (n − deg(D))/n. Thus
n − deg(D)
1
g
deg(D) + 1 − g
+
=1+ − .
n
n
n
n
In particular,
R+δ ≥
1
g
− .
(13.14)
n
n
Thus we wish to show that there exists a sequence of codes, defined from curves, of increasing length n with relative distances tending toward δ such that g/n tends to a number as
close to 0 as possible. Such a construction is not possible using only curves in the projective
plane P2 (Fq ) as the number of points on the curve is finite (at most q 2 + q + 1), implying
that the length of a code defined on the curve cannot approach infinity since the length
is bounded by q 2 + q. This limitation is overcome by allowing the curves we examine to
come from higher dimensional projective space. Although individual curves still have only
a finite number of points, that number can grow as the genus grows. Fortunately, (13.14)
applies for curves in higher dimensional space, as does Theorem 13.4.3.
Since we wish g/n to be close to 0, we want n/g to be as large as possible. To explore
the size of this quantity, define Nq (g) to be the maximum number of Fq -rational points on
a nonsingular absolutely irreducible curve6 X in Pm (Fq ) of genus g, and let
R ≥ −δ + 1 +
A(q) = lim sup
g→∞
6
Nq (g)
.
g
A nonsingular plane curve is absolutely irreducible, but this is not necessarily true of curves in higher dimensions.
544
Codes from algebraic geometry
To construct the codes that we desire we first construct an appropriate family of curves
and then define the codes over these curves. By our definitions, there exists a sequence of
nonsingular absolutely irreducible curves X i over Fq of genus gi with n i + 1 Fq -rational
points on X i where n i and gi go to infinity and limi→∞ (n i + 1)/gi = A(q). Choose an Fq rational point Q i on X i and let P i be the n i remaining Fq -rational points on X i . Let ri be an
integer satisfying 2gi − 2 < ri < n i . By Theorem 13.4.3, the code C i = C(X i , P i , ri Q i )
has length n i , rate Ri = (deg(ri Q i ) + 1 − gi )/n i = (ri + 1 − gi )/n i , and relative distance
δi ≥ 1 − deg(ri Q i )/n i = 1 − ri /n i . Thus
Ri + δi ≥
ri + 1 − gi
ri
1
gi
+1−
=1+
− ,
ni
ni
ni
ni
implying
R ≥ −δ + 1 −
1
,
A(q)
where limi→∞ Ri = R and limi→∞ δi = δ. Hence,
αq (δ) ≥ −δ + 1 −
1
,
A(q)
(13.15)
producing a lower bound on αq (δ). We must determine A(q) in order to see if this bound is
an improvement on the Asymptotic Gilbert–Varshamov Bound.
An upper bound on A(q) was determined by Drinfeld and Vlăduţ [74].
√
Theorem 13.5.2 For any prime power q, A(q) ≤ q − 1.
There is actually equality in this bound by a result of Tsfasman, Vlăduţ, and Zink [333] for
q = p 2 or q = p 4 , p a prime, and a result of Ihara [153] for q = p 2m with m ≥ 3.
Theorem 13.5.3 Let q = p 2m where p is a prime. There is a sequence of nonsingular
absolutely irreducible curves X i over Fq with genus gi and n i + 1 rational points such that
lim
i→∞
ni
√
= q − 1.
gi
This result and (13.15) give the following bound on αq (δ).
Theorem 13.5.4 (Asymptotic Tsfasman–Vlăduţ–Zink Bound) If q = p 2m where p is a
prime, then
1
αq (δ) ≥ −δ + 1 − √
.
q −1
√
This lower bound is determined by the straight line R = −δ + 1 − 1/( q − 1) of negative slope that may or may not intersect the Gilbert–Varshamov curve R = 1 − Hq (δ),
which is concave up. If the line intersects the curve in two points, we have shown that there
is a sequence of codes exceeding the Asymptotic Gilbert–Varshamov Bound. This occurs
when q ≥ 49.
The curves in Theorem 13.5.3 are modular and unfortunately the proof does not give
an explicit construction. In 1995, Garcia and Stichtenoth [96, 97] constructed a tower of
function fields that potentially could be used to construct a sequence of algebraic geometry
545
13.5 The Gilbert–Varshamov Bound revisited
codes that would exceed the Asymptotic Gilbert–Varshamov Bound. A low complexity
algorithm to accomplish this was indeed formulated in 2000 [307].
Exercise 797 For each of q = 4, q = 49, and q = 64 draw the graphs of the inequalities given by the Asymptotic Tsfasman–Vlăduţ–Zink and Asymptotic Gilbert–Varshamov
Bounds. For q = 49 and q = 64, give a range of values of δ such that the Tsfasman–Vlăduţ–
Zink Bound exceeds the Gilbert–Varshamov Bound.
14
Convolutional codes
The [n, k] codes that we have studied to this point are called block codes because we
encode a message of k information symbols into a block of length n. On the other hand
convolutional codes use an encoding scheme that depends not only upon the current message
being transmitted but upon a certain number of preceding messages. Thus “memory” is an
important feature of an encoder of a convolutional code. For example, if x(1), x(2), . . .
is a sequence of messages each from Fqk to be transmitted at time 1, 2, . . . , then an (n, k)
convolutional code with memory M will transmit codewords c(1), c(2), . . . where c(i) ∈ Fqn
depends upon x(i), x(i − 1), . . . , x(i − M). In our study of linear block codes we have
discovered that it is not unusual to consider codes of fairly high lengths n and dimensions k.
In contrast, the study and application of convolutional codes has dealt primarily with (n, k)
codes with n and k very small and a variety of values of M.
Convolutional codes were developed by Elias [78] in 1955. In this chapter we will only
introduce the subject and restrict ourselves to binary codes. While there are a number of
decoding algorithms for convolutional codes, the main one is due to Viterbi; we will examine his algorithm in Section 14.2. The early mathematical theory of these codes was
developed extensively by Forney [88, 89, 90] and has been given a modern algebraic treatment by McEliece in [235]. We use the latter extensively in our presentation, particularly in
Sections 14.3, 14.4, and 14.5. Those interested in a more in-depth treatment should consult
the monograph [225] by Massey, or the books by Lin and Costello [197], McEliece [232],
Piret [257], and Wicker [351]. One of the most promising applications of convolutional
codes is to the construction of turbo codes, which we introduce in Section 15.7. Convolutional codes, along with several codes we have developed in previous chapters, have played
a key role in deep space exploration, the topic of the final section of the book.
14.1
Generator matrices and encoding
There are a number of ways to define convolutional codes. We will present a simple definition
that is reminiscent of the definition for a linear block code.
We begin with some terminology. Let D be an indeterminate; the symbol “D” is chosen
because it will represent a delay operation. Then F2 [D] is the set of all binary polynomials
in the variable D. This set is an integral domain and has a field of quotients that we denote
F2 (D). Thus F2 (D) consists of all rational functions in D, that is, all ratios of polynomials
in D. An (n, k) convolutional code C is a k-dimensional subspace of F2 (D)n . Notice that
as F2 (D) is an infinite field, C contains an infinite number of codewords. The rate of C
547
14.1 Generator matrices and encoding
is R = k/n; for every k bits of information, C generates n codeword bits. A generator
matrix G for C is a k × n matrix with entries in F2 (D) whose rows form a basis of C. Any
multiple of G by a nonzero element of F2 (D) will also be a generator matrix for C. Thus
if we multiply G by the least common multiple of all denominators occurring in entries
of G, we obtain a generator matrix for C all of whose entries are polynomials. From this
discussion we see that a convolutional code has a generator matrix all of whose entries are
polynomials; such a matrix is called a polynomial generator matrix for C. In Section 14.4
we will present a binary version of these polynomial generator matrices.
Before proceeding we make two observations. First, as with linear block codes, the choice
of a generator matrix for a convolutional code determines the encoding. We will primarily
consider polynomial generator matrices for our codes. Even among such matrices, some
choices are better than others for a variety of reasons. Second, until the encoding procedure
and the interpretation of “D” are explained, the connection between message and codeword
will not be understood. We explore this after an example.
Example 14.1.1 Let C 1 be the (2, 1) convolutional code with generator matrix
G 1 = [1 + D + D 2 1 + D 2 ].
A second generator matrix for C 1 is
G ′1 = [1 + D 3 1 + D + D 2 + D 3 ],
which is obtained from G 1 by multiplying by 1 + D, recalling that we are working in binary.
Let C 2 be the (4, 2) convolutional code with generator matrix
G2 =
1 1 + D + D2 1 + D2 1 + D
.
0
1+ D
D
1
By Exercise 798, C 2 is also generated by
G ′2 =
G ′′′
2 =
1
1
1 1
1
D
1+ D 0
, and
, G ′′2 =
0 1+ D D 1
0 1+ D
D
1
1+ D
0
1 D
.
D
1 + D + D2 D2 1
Exercise 798 By applying elementary row operations to G 2 given in Example 14.1.1,
prove that C 2 is also generated by G ′2 , G ′′2 , and G ′′′
2.
We now describe the encoding procedure for an (n, k) convolutional code C using a
polynomial generator matrix G. To emphasize the intimate connection between G and the
encoding process, we often refer to G as an encoder. To simplify matters, we first begin with
the case k = 1. Because k = 1, at each time i = 0, 1, . . . , a single bit (element of F2 ) x(i) is
input to the generator matrix G. The input is thus a stream of bits “entering” the generator
matrix one bit per time interval. Suppose that this input stream has L bits. We can express
L−1
x(i)D i . The encoding is given by xG = c,
this input stream as the polynomial x = i=0
analogous to the usual encoding of a linear block code. The codeword c = (c1 , c2 , . . . , cn )
has n components c j for 1 ≤ j ≤ n, each of which is a polynomial in D. At time 0, the
548
Convolutional codes
generator matrix has produced the n bits c1 (0), . . . , cn (0); at time 1, it has produced the n
bits c1 (1), . . . , cn (1), etc. Often the components c1 , . . . , cn are interleaved so that the output
stream looks like
c1 (0), . . . , cn (0), c1 (1), . . . , cn (1), c1 (2), . . . , cn (2), . . . .
(14.1)
Note that the left-most bits of the input stream x(0), x(1), x(2), . . . are actually input first
and the left-most bits of (14.1) are output first.
Example 14.1.2 Suppose that in Example 14.1.1 G 1 is used to encode the bit stream
110101 using the (2, 1) code C 1 . This bit stream corresponds to the polynomial x =
1 + D + D 3 + D 5 ; thus 1s enter the encoder at times 0, 1, 3, and 5, and 0s enter at
times 2 and 4. This is encoded to xG 1 = (c1 , c2 ), where c1 = (1 + D + D 3 + D 5 )(1 + D +
D 2 ) = 1 + D 4 + D 6 + D 7 and c2 = (1 + D + D 3 + D 5 )(1 + D 2 ) = 1 + D + D 2 + D 7 .
Thus c1 is 10001011 and c2 is 11100001. These are interleaved so that the output is
1101010010001011. If we look more closely at the multiplications and additions done
to produce c1 and c2 , we can see how memory plays a role. Consider the output of codeword c1 at time i = 5, that is, c1 (5). This value is the coefficient of D 5 in the product
x(1 + D + D 2 ), which is x(5) + x(4) + x(3). So at time 5, the encoder must “remember”
the two previous inputs. Similarly, we see that
c1 (i) = x(i) + x(i − 1) + x(i − 2),
c2 (i) = x(i) + x(i − 2).
(14.2)
(14.3)
For this reason, we say that the memory of the encoder is M = 2. We also see the role
played by D as a delay operator. Since c1 is x(1 + D + D 2 ), at time i we have c1 (i) is
x(i)(1 + D + D 2 ). Thus we obtain the above expression for c1 (i) if we interpret x(i)D
as x(i − 1) and x(i)D 2 as x(i − 2). Each occurrence of D j delays the input by j units of
time.
Exercise 799 Suppose that in Example 14.1.1 G ′1 is used to encode messages.
(a) Give the resulting codeword (c1 , c2 ) if 110101 is encoded. Also give the interleaved
output.
(b) Give equations for c1 (i) and c2 (i) for the encoder G ′1 corresponding to equations (14.2)
and (14.3).
(c) What is the memory M for this encoder?
We now examine the encoding process for arbitrary k. This time the input is k bit streams
x j for j = 1, 2, . . . , k, which forms our message x = (x1 , x2 , . . . , xk ). Note that the k bits
x j (0) enter the encoder at time 0 followed by the k bits x j (1) at time 1 and then x j (2) at
time 2, etc. We can write each x j as a polynomial in D as before. The codeword produced
is again xG = c = (c1 , c2 , . . . , cn ) with n components each of which is a polynomial in
D. These components are then interleaved as earlier. Notice that the resulting codeword
is a polynomial combination of the rows of G, which we have given an interpretation
as bit stream outputs. Recalling that the code C is the F2 (D)-row span of the rows of
a generator matrix, the codewords with polynomial components have a meaning as bit
streams where the coefficient of D i gives the output at time i. Codewords with some
549
14.1 Generator matrices and encoding
components equal to rational functions also have an interpretation, which we explore in
Section 14.4.
Example 14.1.3 Suppose that in Example 14.1.1 G 2 is used to encode the message
(11010, 10111) using the (4, 2) code C 2 . This message corresponds to the polynomial pair x = (1 + D + D 3 , 1 + D 2 + D 3 + D 4 ). Computing xG 2 we obtain the codeword c = (1 + D + D 3 , D + D 2 + D 4 , 1 + D 2 + D 3 + D 4 , 0), which corresponds to
(1101, 01101, 10111, 0). Because the resulting components do not have the same length,
the question becomes how do you do the interleaving? It is simply done by padding the
right with 0s. Note that components of the message each have length 5 and so correspond
to polynomials of degree at most 4. As the maximum degree of any entry in G 2 is 2, the
resulting codeword could have components of degree at most 6 and hence length 7. Thus it
would be appropriate to pad each component with 0s up to length 7. So we would interleave
(1101000, 0110100, 1011100, 0000000) to obtain 1010110001101010011000000000. For
this encoder the equations corresponding to (14.2) and (14.3) become
c1 (i) = x1 (i),
c2 (i) = x1 (i) + x1 (i − 1) + x1 (i − 2) + x2 (i) + x2 (i − 1),
c3 (i) = x1 (i) + x1 (i − 2) + x2 (i − 1),
c4 (i) = x1 (i) + x1 (i − 1) + x2 (i).
This encoder has memory M = 2.
Exercise 800 Suppose that in Example 14.1.1 G ′2 is used to encode messages.
(a) Give the resulting codeword (c1 , c2 , c3 , c4 ) if (11010, 10111) is encoded. Also give the
interleaved output.
(b) Give equations for c1 (i), . . . , c4 (i) for the encoder G ′2 corresponding to the equations
in Example 14.1.3.
(c) What is the memory M for this encoder?
Exercise 801 Repeat Exercise 800 using the encoder G ′′2 of Example 14.1.1.
Exercise 802 Repeat Exercise 800 using the encoder G ′′′
2 of Example 14.1.1.
When using a convolutional code, one may be interested in constructing a physical
encoder using shift-registers. Recall from Section 4.2 that we gave encoding schemes for
cyclic codes and indicated how one of these schemes could be physically implemented
using a linear feedback shift-register. For encoding convolutional codes we will use linear
feedforward shift-registers. The main components of such a shift-register are again delay
elements (also called flip-flops) and binary adders shown in Figure 4.1. As with encoders
for cyclic codes, the encoder for a convolutional code is run by an external clock which
generates a timing signal, or clock cycle, every t0 seconds.
Example 14.1.4 Figure 14.1 gives the design of a physical encoder for the code C 1 of
Example 14.1.1 using the encoder G 1 . Table 14.1 shows the encoding of 110101 using this
circuit. The column “Register before cycle” shows the contents of the shift-register before
the start of clock cycle i. Notice that the shift-register initially contains x(0) = 1, x(−1) = 0,
550
Convolutional codes
Table 14.1 Encoding 110101 using
Figure 14.1
Register
before
cycle
Clock
cycle
i
100
110
011
101
010
101
010
001
?00
0
1
2
3
4
5
6
7
8
c1 (i)
c2 (i)
1
0
0
0
1
0
1
1
1
1
1
0
0
0
0
1
✲ ♥
✻
x
✲
x(i)
✲
x(i − 1)
✲
✲ ♥
✲ c1
❄
✲ ♥
✲ c2
✻
x(i − 2)
Figure 14.1 Physical encoder for G 1 = [1 + D + D 2 1 + D 2 ].
and x(−2) = 0; this is the contents before clock cycle 0. At the start of clock cycle i, x(i + 1)
enters the shift-register from the left pushing x(i) into the middle flip-flop and x(i − 1) into
the right-most flip-flop; simultaneously c1 (i) and c2 (i) are computed by the binary adders.
Notice that the last digit x(5) = 1 enters before the start of clock cycle 5; at that point the
shift-register contains 101. Because the encoder has memory M = 2, the clock cycles two
more times to complete the encoding. Thus two more inputs must be given. These inputs
are obviously 0 since 1 + D + D 3 + D 5 is the same as 1 + D + D 3 + D 5 + 0D 6 + 0D 7 .
Notice that the output c1 and c2 agrees with the computation in Example 14.1.2. Clock
cycle 7 is completed with a new bit entering the shift-register on the left. This could be
the first bit of the next message to be encoded; the two right-most registers both contain 0,
preparing the circuit for transmission of the next message. This first bit of the next message
is indicated with a “?” in Table 14.1. Note that the first column of “Register before cycle”
is the input message.
Exercise 803 Draw a physical encoder analogous to Figure 14.1 for the encoder G ′1 from
Example 14.1.1. Also construct a table analogous to Table 14.1 for the encoding of 110101
using the circuit you have drawn. Compare your answer with that obtained in Exercise
799.
551
14.2 Viterbi decoding
Exercise 804 Draw a physical encoder with two shift-registers analogous to Figure 14.1
for the encoder G 2 from Example 14.1.1. Then construct a table analogous to Table 14.1
for the encoding of (11010, 10111) using the circuit you have drawn. Compare your answer
with that obtained in Example 14.1.3.
Exercise 805 Repeat Exercise 804 using the encoder G ′2 from Example 14.1.1. Compare
your answer with that obtained in Exercise 800.
Exercise 806 Repeat Exercise 804 using the generator matrix G ′′2 from Example 14.1.1.
Compare your answer with that obtained in Exercise 801.
Exercise 807 Repeat Exercise 804 using the generator matrix G ′′′
2 from Example 14.1.1.
Compare your answer with that obtained in Exercise 802.
14.2
Viterbi decoding
In this section we present the Viterbi Decoding Algorithm for decoding convolutional codes.
This algorithm was introduced by A. J. Viterbi [339] in 1967. The Viterbi Algorithm has
also been applied in more general settings; see [234, 257, 336]. To understand this algorithm
most easily, we first describe state diagrams and trellises.
14.2.1 State diagrams
To each polynomial generator matrix of a convolutional code we can associate a state
diagram that allows us to do encoding. The state diagram is intimately related to the shiftregister diagram and provides a visual way to find the output at any clock time.
Let G be a polynomial generator matrix for an (n, k) convolutional code C, from which
a physical encoder has been produced. As earlier, it is easiest to define the state diagram
first for the case k = 1. The state of an encoder at time i is essentially the contents of the
shift-register at time i that entered the shift-register prior to time i. For example, the state
of the encoder G 1 at time i with physical encoder given in Figure 14.1 is the contents
(x(i − 1), x(i − 2)) of the two right-most delay elements of the shift-register. If we know
the state of the encoder at time i and the input x(i) at time i, then we can compute the
output (c1 (i), c2 (i)) at time i from this information. More generally, if the shift-register
at time i contains x(i), x(i − 1), . . . , x(i − M), then the encoder is in the state (x(i − 1),
x(i − 2), . . . , x(i − M)). In the previous section we saw that the encoding equation c =
xG yields n equations for the n c j (i)s computed at time i in terms of x(i), x(i − 1), . . . ,
x(i − M), where M is the memory of the encoder. Thus from these equations we see that
if we have the state at time i and the input x(i), we can compute the output c(i).
Example 14.2.1 Consider Table 14.1. Before the start of clock cycle 3, the contents of
the shift-register is 101 and hence the encoder is in state 01. Since x(3) = 1 the encoder
moves to the state 10, which is its state before clock cycle 4 begins, regardless of the
input x(4).
552
Convolutional codes
00
✞ ☎
❄
00
00
✯ ❍
✟
❍ 11
11 ✟✟
❍
✟
✟
❍
❥
❍
10
✟
✛
01
10
✲
❍
✟
00
❍
✟
❍❍
✟ 01
01 ❍
✟
✙
❍ ✟
11
(a)
✝
10
✻
✆
01
00
❆
❆
❆ 11
❅
❅ 00 ❆❆
✁
✁
❅✁
❆
10
✁ ❅❆
01
10
✁
❅
❅ ✁
✁ ❅ 01
✁
10 ❅
11 ✁
cycle i − 1
00
11
(b)
01
10
11
cycle i
Figure 14.2 State diagrams for G 1 = [1 + D + D 2 1 + D 2 ].
The set of states of an encoder of an (n, 1) convolutional code with memory M is the set of
all ordered binary M-tuples. The set of states is of size 2 M and represents all possible states
that an encoder can be in at any time.
The state diagram is a labeled directed graph determined as follows. The vertices of
the graph are the set of states. There are two types of directed edges: solid and dashed.
A directed edge from one vertex to another is solid if the input is 0 and dashed if the
input is 1; the edge is then labeled with the output. In other words, a directed edge
from the vertex (x(i − 1), . . . , x(i − M)) to the vertex (x(i), x(i − 1), . . . , x(i − M + 1))
is solid if x(i) = 0 and dashed if x(i) = 1 and is labeled (c1 (i), . . . , cn (i)). Notice that
by looking at the vertices at the ends of the edge, we determine the contents of a shiftregister at time i and can hence compute the edge label and whether the edge is solid or
dashed.
Example 14.2.2 Figure 14.2 shows the state diagram for the generator matrix G 1 from
Example 14.1.1. Two versions of the state diagram are given. One can encode any message
by traversing through the diagram in Figure 14.2(a) always beginning at state 00 and ending
at state 00. For example, if 101 is to be encoded, begin in state 00, traverse the dashed edge
as the first input is 1; this edge ends in state 10 and outputs 11. Next traverse the solid edge,
as the next input is 0, from state 10 to state 01 and output 10. Now traverse the dashed
edge from state 01 to state 10 outputting 00. We have run out of inputs, but are not at state
00 – the shift-register is not all 0s yet. So we input 0 and hence traverse from state 10 to state
01 along the solid edge with output 10; finally, input another 0 and travel from state 01 to
state 00 along the solid edge with output 11. The encoding of 101 is therefore 1110001011.
Figure 14.2(b) is essentially the same diagram expanded to show the states at clock times
i − 1 and i.
553
Y
L
14.2 Viterbi decoding
F
Exercise 808 Draw two versions of the state diagram for the encoder G ′1 from
Example 14.1.1 as in Figure 14.2.
T
m
a
e
The same general idea holds for (n, k) convolutional codes with polynomial generator
matrices G = (gi, j (D)) when k > 1. For 1 ≤ i ≤ k, let
m i = max deg(gi, j (D))
1≤ j≤n
(14.4)
be the maximum degree of an entry in row i; m i is called the degree of the ith row of G. We
will adopt the convention that the zero polynomial has degree −∞. Let x = (x1 , . . . , xk ) be
a message to be encoded with the jth input at time i being x j (i). Let xG = c = (c1 , . . . , cn )
be the resulting codeword. Each c j (i) is a combination of some of the x J (I ) with I = i, i −
1, . . . , i − m J for 1 ≤ J ≤ k. Hence if we know the inputs x J (i) and all J th inputs from
times i − 1 back to time i − m J , we can determine the output. So at time i, we say that the
encoder is in the state (x1 (i − 1), . . . , x1 (i − m 1 ), x2 (i − 1), . . . , x2 (i − m 2 ), . . . , xk (i −
1), . . . , xk (i − m k )). The set of states of G is the set of all ordered binary (m 1 + m 2 + · · · +
m k )-tuples. The set of states is of size 2m 1 +m 2 +···+m k and represents all possible states that an
encoder can be in at any time. Note that the memory M of the encoder G is the maximum
of the m j s; in particular when k = 1, M = m 1 .
Example 14.2.3 For the encoder G 2 of the (4, 2) convolutional code C 2 of Example 14.1.1,
m 1 = 2 and m 2 = 1. So there are eight states representing (x1 (i − 1), x1 (i − 2), x2 (i − 1))
for G 2 ; notice that c j (i) given in Example 14.1.3 is determined from the state (x1 (i −
1), x1 (i − 2), x2 (i − 1)) and the input (x1 (i), x2 (i)). For the generator matrix G ′2 of C 2 ,
m 1 = m 2 = 1, and the four states represent (x1 (i − 1), x2 (i − 1)). For the encoder G ′′2 of
C 2 , m 1 = 0, m 2 = 1, and the two states represent x2 (i − 1). Finally, for the encoder G ′′′
2 of
C 2 , m 1 = 1, m 2 = 2, and the eight states represent (x1 (i − 1), x2 (i − 1), x2 (i − 2)).
The state diagram for k > 1 is formed as in the case k = 1. The vertices of the labeled
directed graph are the set of states. This time, however, there must be 2k types of edges that
represent each of the 2k possible binary k-tuples coming from each of the possible inputs
(x1 (i), x2 (i), . . . , xk (i)). Again the edge is labeled with the output (c1 (i), . . . , cn (i)).
Example 14.2.4 Consider the state diagram analogous to that given in Figure 14.2(a) for
the generator matrix G 2 of the (4, 2) convolutional code C 2 of Example 14.1.1. The state
diagram has eight vertices and a total of 32 edges. The edges are of four types. There are four
edges, one of each of the four types, leaving each vertex. For instance, suppose that all edges
representing the input (x1 (i), x2 (i)) = (0, 0) are solid red edges, those representing (1, 0)
are solid blue edges, those representing (0, 1) are dashed blue edges, and those representing
(1, 1) are dashed red edges. Then, using the equations for c j (i) given in Example 14.1.3, the
solid red edge leaving (x1 (i − 1), x1 (i − 2), x2 (i − 1)) = (0, 1, 1) ends at vertex (0, 0, 0)
and is labeled (0, 0, 0, 0); the solid blue edge leaving (0, 1, 1) ends at vertex (1, 0, 0) and
is labeled (1, 1, 1, 1); the dashed blue edge leaving (0, 1, 1) ends at vertex (0, 0, 1) and is
labeled (0, 1, 0, 1); and the dashed red edge leaving (0, 1, 1) ends at vertex (1, 0, 1) and is
labeled (1, 0, 1, 0).
554
Convolutional codes
a = 00
s
❆
s
❆
00
s ❆
c = 10
s
d = 11 s
i= 0
s
❆
s ❆
11
s
❆
s
❆
00
❆
11
00
❆
11
s
❆
00
❆
11
s q q q
s ❆ 11 s ❆ 11 s ❆ 11 s ❆ 11 s q q q
❅ 00 ❆ ✁❅ 00 ❆ ✁❅ 00 ❆ ✁❅ 00 ❆ ✁
✁
✁
✁
✁
❆ 10 ❅✁❆ 10 ❅✁❆ 10 ❅✁❆ 10 ❅✁❆
❆ 10
❆s 01 ✁ ❅❆s 01 ✁ ❅❆s 01 ✁ ❅❆s 01 ✁ ❅❆s q q q
❆s
❅ ✁
❅ ✁
❅ ✁
❅ ✁
❅
✁❅ 01
✁❅ 01
✁❅ 01
✁❅ 01
❅ 01
✁
✁
✁
✁
❅✁s 10 ❅✁s 10 ❅✁s 10 ❅✁s 10 ❅s q q q
s
11
❆
00
❆
❆
❆
b = 01
00
1
11
❆
2
3
4
2
5
6
2
Figure 14.3 Trellis for the encoder G 1 = [1 + D + D 1 + D ].
Exercise 809 Verify that the edge labels for the four edges described in Example 14.2.4
are as claimed.
Exercise 810 Give the state diagram for the encoder G ′2 of the (4, 2) convolutional code
C 2 from Example 14.1.1. It may be easiest to draw the diagram in the form similar to that
given in Figure 14.2(b).
Exercise 811 Repeat Exercise 810 with the encoder G ′′2 from Example 14.1.1.
14.2.2 Trellis diagrams
The trellis diagram for an encoding of a convolutional code is merely an extension of the
state diagram starting at time i = 0. We illustrate this in Figure 14.3 for the generator matrix
G 1 of the (2, 1) convolutional code C 1 from Example 14.1.1. The states are labeled a = 00,
b = 01, c = 10, and d = 11. Notice that the trellis is a repetition of Figure 14.2(b) with
the understanding that at time i = 0, the only state used is the zero state a; hence at time
i = 1, the only states that can be reached at time i = 1 are a and c and thus they are the
only ones drawn at that time. Encoding can be accomplished by tracing left-to-right through
the trellis beginning at state a (and ending at the zero state a). For instance, if we let state
s ∈ {a, b, c, d} at time i be denoted by si , then the encoding of 1011 is accomplished by
the path a0 c1 b2 c3 d4 b5 a6 by following a dashed, then solid, then two dashed, and finally two
solid edges (as we must input two additional 0s to reach state a); this yields the codeword
111000010111 by writing down the labels on the edges of the path.
Exercise 812 Use the trellis of Figure 14.3 to encode the following messages:
(a) 1001,
(b) 0011.
G ′1
Exercise 813 Draw the trellis diagram for the encoder
of the (2, 1) convolutional code
C 1 presented in Example 14.1.1. The results of Exercise 808 will be useful.
Exercise 814 Draw the trellis diagram for the encoder G ′2 of the (4, 2) convolutional code
C 2 presented in Example 14.1.1. The results of Exercise 810 will be useful.
555
14.2 Viterbi decoding
y = 11
a = 00
s
❆
00
c = 10
d = 11
i= 0
s
❆
00
11
s
❆
❆
00
11
01
s
❆
00
11
10
s
❆
00
11
01
s
❆
00
11
01
s
00
11
11
s
❆
❆
❆
❆
❆
❆
s
s
s ❆ 11 s
s
11
11 s
11
11
11
❅
❅
❅
❅
✁
✁
✁
✁
❆
❆
00 ❆
00 ❆
00 ❆
00 ❆ ✁
✁
✁
✁
✁
✁
❆ 10 ❅✁❆ 10 ❅✁❆ 10 ❅✁❆ 10 ❅✁❆ 10 ✁
❆ 10
❆s 01 ✁ ❅❆s 01 ✁ ❅❆s 01 ✁ ❅❆s 01 ✁ ❅❆s 01 ✁
❆s
❅ ✁
❅ ✁
❅ ✁
❅ ✁
❅
✁
✁
✁
✁
✁
✁
01
01
01
01
01
❅
❅
❅
❅
❅
✁
✁
✁
✁
✁
❅✁s 10 ❅✁s 10 ❅✁s 10 ❅✁s 10 ❅✁s
❆
❆
❆
b = 01
01
00
11
s
❆
1
2
3
4
5
6
2
7
8
2
Figure 14.4 L = 6 truncated trellis for the encoder G 1 = [1 + D + D 1 + D ].
14.2.3 The Viterbi Algorithm
The Viterbi Algorithm uses the trellis diagram for an encoder of a convolutional code
to decode a received vector.1 The version of the Viterbi Algorithm that we describe will
accomplish nearest neighbor decoding. Suppose that a message to be encoded is input over
L time periods using a generator matrix with memory M. The algorithm requires us to
consider the portion of the trellis that starts at time i = 0 and ends at time L + M; this is
called the L truncated trellis. We always move left-to-right through the trellis. Figure 14.4
shows the L = 6 truncated trellis for the generator matrix G 1 from Example 14.1.1 that we
must examine to decode a message of length 6.
Suppose that the message x(i) = (x1 (i), . . . , xk (i)) for i = 0, 1, . . . , L − 1 is encoded
using the generator matrix G to produce a codeword c(i) = (c1 (i), . . . , cn (i)) for i =
0, 1, . . . , L + M − 1. Assume y(i) = (y1 (i), . . . , yn (i)) for i = 0, 1, . . . , L + M − 1 is received. We define the weight of a path in the trellis. First, if e is an edge connecting state
s at time i − 1 to state s ′ at time i, then the weight of the edge e is the Hamming distance
between the edge label of e and the portion y(i − 1) of the received vector at time i − 1.
The weight of a path P through the trellis is the sum of the weights of the edges of the
path. The edge and path weights therefore depend upon the received vector. Denote the
zero state a at time 0 by a0 . Suppose P is a path in the trellis starting at a0 and ending at
time i in state s; such a path P is a survivor at time i if its weight is smallest among all
paths starting at a0 and ending at time i in state s. The collection of all survivors starting
at a0 and ending in state s at time i will be denoted S(s, i). If P is a path in the trellis
starting at a0 and ending at time I , define c P to be the codeword associated with P where
c P (i) is the label of the edge in P from the state at time i to the state at time i + 1 for
0 ≤ i < I . Define x P to be the message associated with P where x P (i) is the input identified by the type of the edge in P from the state at time i to the state at time i + 1 for
0 ≤ i < min{I, L}.
1
Trellises and the Viterbi Algorithm can also be used for decoding block codes. For a discussion of this, see
[336].
556
Convolutional codes
We now describe the Viterbi Decoding Algorithm in four steps:
I. Draw the L truncated trellis for G, replacing the edge labels by the edge weights. Let
a denote the zero state.
II. Compute S(s, 1) for all states s using the trellis of Step I.
III. Repeat the following for i = 2, 3, . . . , L + M using the trellis of Step I. Assuming
S(s, i − 1) has been computed for all states s, compute S(s, i) for all s as follows. For
each state s ′ and each edge e from s ′ to s, form the path P made from P ′ followed by e
where P ′ ∈ S(s ′ , i − 1). Include P in S(s, i) if it has smallest weight among all such
paths.
IV. A nearest neighbor to y is any c P for P ∈ S(a, L + M) obtained from the message
given by x P .
Why does this work? Clearly, a nearest neighbor c to y is a c P obtained from any path P
in S(a, L + M) since S(a, L + M) contains all the smallest weight paths from a0 to a L+M
with L + M edges, where a L+M denotes the zero state at time L + M. The only thing we
need to confirm is that Steps II and III produce S(a, L + M). This is certainly the case
if we show these steps produce S(s, i). Step II by definition produces S(s, 1) for all s. If
P ∈ S(s, i), then P is made up of a path P ′ ending at time i − 1 in state s ′ followed by an
edge e from s ′ to s. If P ′ is not in S(s ′ , i − 1), then there is a path P ′′ from a0 ending in state
s ′ at time i − 1 of lower weight than P ′ . But then the path P ′′ followed by e ends in state
s at time i and has lower weight than P, a contradiction. So Step III correctly determines
S(s, i).
Example 14.2.5 We illustrate the Viterbi Algorithm by decoding the received vector y =
1101110110010111 using the trellis of Figure 14.4. Recall that for the code C 1 , two bits
are received at each clock cycle; y is listed at the top of Figure 14.4 for convenience. The
trellis for Step I is given in Figure 14.5. Again the states are a = 00, b = 01, c = 10, and
d = 11; we denote state s at time i by si . So the zero state at time 0 is a0 .
For Step II, S(a, 1) = {a0 a1 }, S(b, 1) = ∅, S(c, 1) = {a0 c1 }, and S(d, 1) = ∅. From this
we can compute S(s, 2) using the rules of Step III obtaining S(a, 2) = {a0 a1 a2 }, S(b, 2) =
{a0 c1 b2 }, S(c, 2) = {a0 a1 c2 }, and S(d, 2) = {a0 c1 d2 }; notice that the paths listed in S(s, 2)
a = 00
s
❆
s
❆
2
c = 10
d = 11
i= 0
❆
2
❆
❆
❆
b = 01
s
❆
1
0
s
❆
1
❆
1
s
❆
1
❆
1
s
❆
1
❆
1
s
1
1
s
2
0
s ❆ 0 s ❆ 1 s ❆ 1 s ❆ 1 s
s
❅
✁
❆
❆
2 ❆ ✁ ❅ 1 ❆ ✁ ❅ 1 ❆ ✁❅ 1 ❆ ✁
✁
✁
✁
✁
✁
❆ 1 ❅✁❆ 2 ❅✁❆ 0 ❅✁❆ 2 ❅✁❆ 2 ✁
❆ 2
❆s 1 ✁ ❅❆s 0 ✁ ❅❆s 2 ✁ ❅❆s 0 ✁ ❅❆s 0 ✁
❆s
❅ ✁
❅ ✁
❅ ✁
❅ ✁
❅
✁
✁
✁
✁
✁
✁
0
1
0
2
0
❅
❅
❅
❅
❅
✁
✁
✁
✁
✁
1
2
0
❅✁s 2 ❅✁s
❅✁s
❅✁s
❅✁s
❆
0
1
s
1
2
3
4
5
6
7
Figure 14.5 Weights in the truncated trellis for the encoder and received vector of Figure 14.4.
8
557
14.2 Viterbi decoding
a = 00
s
❆
s
2
c = 10
d = 11
i= 0
3❆
❆
0
❆
s
0
s
2
2
❆
❆
❆
b = 01
s
1
2❆
1
1
s
1
2❆
s
1
❆
1
s
4
3
1
s
s
s ❆ 1 s
s
0
3
2❅
✁1❅ 1
✁2
❆ ✁1❅ 1
1❆ ✁
✁
✁
❅ ❆✁
❅ ✁ 0 ❅
❆
❆ 2
✁❆
✁
✁
✁
❅s 0 ✁ ❅❆s 0 ✁
❆s 1 ✁ ❆ s 0 ✁ ❅ s
❆s
3
2❅ ✁
2
3❅ ✁
3
0❅
✁
✁
✁
✁
✁
✁
❅0
❅0
❅0
✁
✁
✁
✁
❅✁s 1
❅✁s
✁s 2 ❅s 0
✁s
❆
1
❆
1
s
2
0
s ❆
2
0
1
3
3
2
2
3
4
5
6
7
8
Figure 14.6 Viterbi Algorithm for the encoder and received vector of Figure 14.4.
begin with either a0 a1 or a0 c1 , which are the paths in S(s ′ , 1). These paths are given
in Figure 14.6. Consider how to construct S(a, 3). Looking at the paths in each S(s, 2),
we see that there are only two possibilities: a0 a1 a2 followed by the edge a2 a3 giving the
path a0 a1 a2 a3 and a0 c1 b2 followed by the edge b2 a3 giving the path a0 c1 b2 a3 . However,
the former has weight 5 while the latter has weight 2 and hence S(a, 3) = {a0 c1 b2 a3 }.
Computing the other S(s, 3) is similar. We continue inductively until S(a, 8) is computed.
All the paths are given in Figure 14.6; to make computations easier we have given the path
weight under the end vertex of the path. Notice that some of the paths stop (e.g. a0 a1 c2 ).
That is because when one more edge is added to the path, the resulting path has higher
weight than other paths ending at the same vertex.
To complete Step IV of the algorithm, we note that S(a, 8) contains the lone path P =
a0 c1 d2 d3 b4 c5 d6 b7 a8 of weight 2. The codeword c P , obtained by tracing P in Figure 14.4, is
1101100100010111, which differs in two places (the weight of path P) from the received
vector. All other paths through the trellis have weight 3 or more and so the corresponding
codewords are distance at least 3 from y. As solid edges come from input 0 and dashed
edges come from input 1, the six inputs leading to c P , obtained by tracing the first six edges
of P in Figure 14.4, give the message 111011.
Exercise 815 Verify that the weights given in the truncated trellis of Figure 14.5 are
correct.
Exercise 816 Verify that the Viterbi Algorithm described in Example 14.2.5 is correctly
summarized by Figure 14.6.
Exercise 817 Use the Viterbi Algorithm to decode the following received vectors sent using
the (2, 1) convolutional code C 1 with generator matrix G 1 from Example 14.1.1. Draw the
trellis analogous to that of Figure 14.6. Give the path leading to the nearest codeword, the
nearest codeword, the number of errors made in transmission (assuming the codeword you
calculate is correct), and the associated message.
(a) 1100011111111101.
(b) 0110110110011001. (There are two answers.)
558
Convolutional codes
Convolutional codes came into more extensive use with the discovery of the Viterbi
Algorithm in 1967. As can be seen from the examples and exercises in this section, the
complexity of the algorithm for an (n, k) convolutional code depends heavily upon the
memory M and upon k. For this reason, the algorithm is generally used only for M and
k relatively small. There are other decoding algorithms, such as sequential decoding, for
convolutional codes. However, each algorithm has its own drawbacks.
14.3
Canonical generator matrices
As we have seen, a convolutional code can have many different generator matrices, including
ones whose entries are rational functions in D that are not polynomial. We have focused on
polynomial generator matrices; among these polynomial generator matrices, the preferred
ones are the canonical generator matrices.
We begin with some requisite terminology. Let G = [gi, j (D)] be a k × n polynomial
matrix. Recall from (14.4) that the degree of the ith row of G is defined to be the maximum
degree of the entries in row i. Define the external degree of G, denoted extdeg G, to be the
sum of the degrees of the k rows of G.
Example 14.3.1 The generator matrices G 1 and G ′1 for the (2, 1) convolutional code C 1 of
Example 14.1.1 have external degrees 2 and 3, respectively. The generator matrix G 2 for
the code C 2 of the same example has external degree 2 + 1 = 3.
Exercise 818 Find the external degrees of the generator matrices G ′2 , G ′′2 , and G ′′′
2 of the
code C 2 from Example 14.1.1.
A canonical generator matrix for a convolutional code C is any polynomial generator
matrix whose external degree is minimal among all polynomial generator matrices for C.
By definition, every convolutional code has a canonical generator matrix. This smallest
external degree is called the degree of the code C.
Example 14.3.2 We can show that G 1 is a canonical generator matrix for the (2, 1) code C 1
from Example 14.1.1 as follows. In this case the external degree of any polynomial generator
matrix of C 1 is the maximum degree of its entries. Every other generator matrix G ′′1 can be
obtained from G 1 by multiplying G 1 by p(D)/q(D), where p(D) and q(D) are relatively
prime polynomials. Since we only need to consider polynomial generator matrices, q(D)
must divide both entries of G 1 and hence the greatest common divisor of both entries of
G 1 . As the latter is 1, q(D) = 1 and G ′′1 is obtained from G 1 by multiplying the entries by
the polynomial p(D). If p(D) = 1, then the degrees of each entry in G ′′1 are larger than the
degrees of the corresponding entries in G 1 . Thus G 1 has smallest external degree among
all polynomial generator matrices for C 1 and therefore is canonical.
Exercise 819 Let C be an (n, 1) convolutional code. Prove that a polynomial generator
matrix G of C is canonical if and only if the greatest common divisor of its entries is 1.
Hint: Study Example 14.3.2.
559
14.3 Canonical generator matrices
From Exercise 819, one can check relatively easily to see if a generator matrix for an
(n, 1) convolutional code is canonical or not. The process is not so clear for (n, k) codes
with k > 1. It turns out (see Theorem 14.3.7) that a generator matrix is canonical if it is
both basic and reduced, terms requiring more notation.
Let G be a k × n polynomial matrix with k ≤ n. A k × k minor of G is the determinant of
a matrix consisting of k columns of G. G has ( nk ) minors. The internal degree of G, denoted
intdeg G, is the maximum degree of all of the k × k minors of G. A basic polynomial
generator matrix of a convolutional code is any polynomial generator matrix of minimal
internal degree.
Exercise 820 Find the internal degree of each matrix G 1 , G ′1 , G 2 , G ′2 , G ′′2 , and G ′′′
2 from
Example 14.1.1. Notice that the internal degrees are always less than or equal to the external
degrees found in Example 14.3.1 and Exercise 818.
A k × k matrix U with entries in F2 [D] (that is, polynomials in D) is unimodular if its
determinant is 1. A polynomial matrix G is reduced if among all polynomial matrices of
the form U G, where U is a k × k unimodular matrix, G has the smallest external degree.
Since a unimodular matrix can be shown to be a product of elementary matrices, a matrix
is reduced if and only if its external degree cannot be reduced by applying a sequence of
elementary row operations to the matrix.
In [235], six equivalent formulations are given for a matrix to be basic and three are given
for a matrix to be reduced. We present some of these equivalent statements here, omitting
the proofs and referring the interested reader to [235, Appendix A].
Theorem 14.3.3 Let G be a polynomial generator matrix of an (n, k) convolutional code.
The matrix G is basic if and only if any of the following hold.
(i) The greatest common divisor of the k × k minors of G is 1.
(ii) There exists an n × k matrix K with polynomial entries so that G K = Ik .
(iii) If c ∈ F2 [D]n and c = xG, then x ∈ F2 [D]k .
Recall that the matrix K in (ii) is a right inverse of G. Part (iii) of Theorem 14.3.3
states that if G is a basic encoder, then whenever the output is polynomial, so is the input.
In Section 14.5, we will examine the situation where (iii) fails, namely where there is a
polynomial output that comes from a non-polynomial input. Note that the proof that (ii)
implies (iii) is simple. If (ii) holds and c = xG, then x = xIk = xG K = cK ; as c and K
have only polynomial entries, so does x.
The degree of a vector v(D) ∈ F[D]n is the largest degree of any of its components. Note
that this definition is consistent with the definition of “degree” for a matrix given in (14.4).
Theorem 14.3.4 Let G be a k × n polynomial matrix. Let gi (D) be the ith row of G. The
matrix G is reduced if and only if either of the following hold.
(i) intdeg G = extdeg G.
(ii) For any x(D) = (x1 (D), . . . , xk (D)) ∈ F2 [D]k ,
deg (x(D)G) = max {deg xi (D) + deg gi (D)}.
1≤i≤k
Part (ii) is called the predictable degree property of reduced matrices.
560
Convolutional codes
Example 14.3.5 Notice that Exercise 819 shows that a polynomial generator matrix of an
(n, 1) convolutional code is canonical if and only if it is basic. If G is any 1 × n polynomial
matrix, then the 1 × 1 minors of G are precisely the polynomial entries of the matrix. So G
is reduced by Theorem 14.3.4. This shows that for an (n, 1) code, a polynomial generator
matrix is canonical if and only if it is basic and reduced.
We wish to show in general that a polynomial generator matrix for any (n, k) convolutional
code is canonical if and only if it is basic and reduced. Exercise 820 hints at part of the
following, which will be necessary for our result.
Lemma 14.3.6 Let G be a k × n polynomial matrix over F2 (D) with k ≤ n. Let N be a
nonsingular k × k polynomial matrix with entries in F2 [D]. The following hold:
(i) intdeg N G = intdeg G + deg det N .
(ii) intdeg G ≤ intdeg N G. Equality holds in this expression if and only if N is unimodular.
(iii) intdeg G ≤ extdeg G.
Proof: To prove (i), we observe that the k × k submatrices of N G are precisely the k × k
submatrices of G each multiplied on the left by N . Thus the k × k minors of N G are exactly
the k × k minors of G each multiplied by det N . This proves (i). Part (ii) follows from (i).
For (iii), suppose the degree of the ith row of G is m i . Then extdeg G = m 1 + m 2 + · · · +
m k . Any k × k minor of G is a sum of products of entries of G with one factor in each
product from each row. Hence the highest degree product is at most m 1 + m 2 + · · · + m k ,
implying that this is the maximum degree of the minor. Thus intdeg G ≤ extdeg G.
Theorem 14.3.7 A polynomial generator matrix of an (n, k) convolutional code C is canonical if and only if it is basic and reduced.
Proof: Let G be a canonical generator matrix for C. Since the basic generator matrices have a common internal degree, let this degree be d0 . Choose from among the
basic generator matrices a matrix G 0 whose external degree is as small as possible. If
U is any unimodular matrix, then intdeg U G 0 = intdeg G 0 = d0 by Lemma 14.3.6(ii). By
definition of G 0 , extdeg U G 0 ≥ extdeg G 0 (as U G 0 generates C), implying that G 0 is reduced. Since G 0 has the smallest possible internal degree of any polynomial generator
matrix of C, intdeg G 0 ≤ intdeg G. But intdeg G ≤ extdeg G by Lemma 14.3.6(iii) and
extdeg G ≤ extdeg G 0 by definition as G is canonical. Thus
intdeg G 0 ≤ intdeg G ≤ extdeg G ≤ extdeg G 0 .
(14.5)
As G 0 is reduced, intdeg G 0 = extdeg G 0 by Theorem 14.3.4(i). Therefore we have equality
everywhere in (14.5). Thus intdeg G = intdeg G 0 = d0 showing that G has minimal internal
degree among all polynomial generator matrices of C, implying that G is basic also. As
intdeg G = extdeg G, G is reduced by Theorem 14.3.4(i).
Now suppose that G is both a basic and a reduced polynomial generator matrix of C.
Let G 0 be any other polynomial generator matrix of C. By Lemma 14.3.6, extdeg G 0 ≥
intdeg G 0 . Since G is basic, intdeg G 0 ≥ intdeg G. As G is reduced, intdeg G = extdeg G
by Theorem 14.3.4(i). By combining these inequalities, extdeg G 0 ≥ extdeg G implying
that G is canonical.
561
14.3 Canonical generator matrices
Exercise 821 Which of the polynomial generator matrices G 2 , G ′2 , G ′′2 , and G ′′′
2 of the
(4, 2) convolutional code C 2 from Example 14.1.1 are basic? Which are reduced? Which
are canonical?
The proof of Theorem 14.3.7 shows that the minimal internal degree of any polynomial
generator matrix of an (n, k) convolutional code C is also the minimal external degree of
any polynomial generator matrix of C, that is, the sum of the row degrees of a canonical
generator matrix. Amazingly, the set of row degrees is unique for a canonical generator
matrix. To prove this, we need the following lemma.
Lemma 14.3.8 Let G be a canonical generator matrix for an (n, k) convolutional code C.
Reorder the rows of G so that the row degrees satisfy m 1 ≤ m 2 ≤ · · · ≤ m k , where m i is the
degree of the ith row. Let G ′ be any polynomial generator matrix for C. Analogously reorder
its rows so that its row degrees m i′ satisfy m ′1 ≤ m ′2 ≤ · · · ≤ m ′k . Then m i ≤ m i′ for
1 ≤ i ≤ k.
Proof: Let gi (D) be the ith row of G with deg gi (D) = m i . Analogously let gi′ (D) be
the ith row of G ′ with deg gi′ (D) = m i′ . Suppose the result is false. Then there exists an
integer j such that m 1 ≤ m ′1 , . . . , m j ≤ m ′j but m j+1 > m ′j+1 . Let 1 ≤ i ≤ j + 1. Then
deg gi′ (D) < m j+1 . Since gi′ (D) is a polynomial codeword, that is a “polynomial output,”
and G is basic, there exists a “polynomial input” xi (D) = (xi,1 (D), . . . , xi,k (D)) such that
gi′ (D) = xi (D)G by Theorem 14.3.3(iii). As G is also reduced, the “predictable degree
property” of Theorem 14.3.4(ii) implies that:
m i′ = deg gi′ (D) = deg (xi (D)G) = max {deg xi,ℓ (D) + deg gℓ (D)}
1≤ℓ≤k
= max {deg xi,ℓ (D) + m ℓ }.
1≤ℓ≤k
As deg gi′ (D) < m j+1 , this implies that deg xi,ℓ (D) = −∞ for ℓ ≥ j + 1. In other words
xi,ℓ (D) = 0 for ℓ ≥ j + 1. Since gi′ (D) = xi (D)G, gi′ (D) must be a polynomial combination
of the first j rows of G for 1 ≤ i ≤ j + 1. Hence the first j + 1 rows of G ′ are dependent
over F2 (D), a contradiction.
This lemma immediately yields the following.
Theorem 14.3.9 Let C be an (n, k) convolutional code. The set of row degrees is the same
for all canonical generator matrices for C.
This unique set of row degrees of a canonical generator matrix for C is called the set of
Forney indices of C. The sum of the Forney indices is the external degree of a canonical
generator matrix, which is the degree of C. (The degree of C is sometimes called the
constraint length or the overall constraint length.) If m is the degree of the code, the
code is often termed an (n, k, m) code. The Viterbi Algorithm requires 2m states when
a canonical encoder is used. If we construct a physical encoder for C from a canonical
generator matrix, the memory of the encoder is the largest Forney index. Since this index
is unique, it is called the memory M of the code. Note that by Lemma 14.3.8, the memory
of a physical encoder derived from any polynomial generator matrix is at least the memory
of the code. For example, the memory of the physical encoders G 2 , G ′2 , G ′′2 , and G ′′′
2 for C 2
of Example 14.1.1 are 2, 1, 1, and 2, respectively; C 2 has memory 1.
562
Convolutional codes
Example 14.3.10 The (4, 2) code C 2 of Example 14.1.1 has canonical generator matrix G ′′2
by Exercise 821. Thus the Forney indices for C 2 are 0 and 1. The degree and memory of C 2
are therefore each equal to 1, and C 2 is a (4, 2, 1) code.
Exercise 822 Let C 3 and C 4
1
1
1
G 3 = 1 + D
1
0
0
1+ D 1
be (4, 3) convolutional codes with generator matrices
1 0 0
1
1
1 ,
0 and G 4 = 0 1 0
0 0 1 1 + D2
0
respectively.
(a) Verify that G 3 and G 4 are canonical generator matrices.
(b) Give the internal degree, external degree, degree, Forney indices, and memories of C 3
and C 4 .
Since every convolutional code has a canonical generator matrix, it is natural to ask how
to construct such a generator matrix. A method for doing so, starting with any polynomial
generator matrix, can be found in [235, Appendix B]. One reason for the interest in canonical
generator matrices is that they lead directly to the construction of physical encoders for the
code that have a minimum number of delay elements, as hinted at prior to Example 14.3.10.
This minimum number of delay elements equals the degree of the code; see [235, Section 5].
14.4
Free distance
In this section we examine a parameter, called the free distance, of a convolutional code that
gives an indication of the code’s ability to correct errors. The free distance plays the role
in a convolutional code that minimum distance plays in a block code, but its interpretation
is not quite as straightforward as in the block code case. The free distance is a measure of
the ability of the decoder to decode a received vector accurately: in general, the higher the
free distance, the lower the probability that the decoder will make an error in decoding a
received vector. The free distance is usually difficult to compute and so bounds on the free
distance become important.
To define free distance, we first define the weight of an element of F2 (D). Every element
f (D) ∈ F2 (D) is a quotient p(D)/q(D) where p(D) and q(D) are in F2 [D]. In addition,
f (D) = p(D)/q(D) has a one-sided Laurent series expansion f (D) = i≥a f i D i , where
a is an integer, possibly negative, and f i ∈ F2 for i ≥ a. The Laurent series is actually a
power series (a = 0) if q(0) = 0. We define the weight of f (D) to be the number of nonzero
coefficients in the Laurent series expansion of f (D). We denote this value, which may be
infinite, by wt( f (D)).
Example 14.4.1 To illustrate this weight, notice that wt(1 + D + D 9 ) = 3, whereas
wt((1 + D)/(1 + D 3 )) = wt(1 + D + D 3 + D 4 + D 6 + D 7 + · · · ) = ∞.
When does an element of F2 (D) have finite weight? If f (D) = i≥a f i D i has finite
weight, then multiplying by a high enough power of D produces a polynomial with the
563
14.4 Free distance
same weight. Hence f (D) must have been a polynomial divided by some nonnegative
integer power of D.
Lemma 14.4.2 An element f (D) ∈ F2 (D) has finite weight if and only if either f (D) is a
polynomial or a polynomial divided by a positive integer power of D.
We extend the definition of weight of an element of F2 (D) to an n-tuple u(D) of F2 (D)n by
n
wt(u i (D))
saying that the weight of u(D) = (u 1 (D), . . . , u n (D)) ∈ F2 (D)n is the sum i=1
of the weights of each component of u(D). The distance between two elements u(D) and
v(D) of F2 (D)n is defined to be wt(u(D) − v(D)). The free distance dfree of an (n, k)
convolutional code is defined to be the minimum distance between distinct codewords
of C. As C is linear over F2 (D), the free distance of C is the minimum weight of any
nonzero codeword of C. If the degree m and free distance d are known, then C is denoted
an (n, k, m, d) convolutional code.
Example 14.4.3 We can show that the code C 1 of Example 14.1.1 has free distance 5,
as follows. Using the generator matrix G 1 for C 1 , we see that every codeword of C 1 has
the form c(D) = ( f (D)(1 + D + D 2 ), f (D)(1 + D 2 )), where f (D) = p(D)/q(D) is in
F2 (D). Suppose c(D) has weight between 1 and 4, inclusive. As each component must have
finite weight, by Lemma 14.4.2, we may multiply by some power of D, which does not
change weight, and thereby assume that c(D) has polynomial components. Reducing f (D)
so that p(D) and q(D) are relatively prime, q(D) must be a divisor of both 1 + D + D 2
and 1 + D 2 . Thus q(D) = 1. Hence if the free distance of C 1 is less than 5, there exists a
codeword c = ( f (D)(1 + D + D 2 ), f (D)(1 + D 2 )) of weight between 1 and 4 with f (D)
a polynomial. If f (D) has only one term, wt(c(D)) = 5. Suppose that f (D) has at least
two terms. Notice that both wt( f (D)(1 + D + D 2 )) ≥ 2 and wt( f (D)(1 + D 2 )) ≥ 2. Since
1 ≤ wt(c(D)) ≤ 4, wt( f (D)(1 + D + D 2 )) = wt( f (D)(1 + D 2 )) = 2. But wt( f (D) ×
b
D 2i for some nonnegative integer a and
(1 + D 2 )) = 2 only holds if f (D) = D a i=0
some positive integer b. In this case wt( f (D)(1 + D + D 2 )) > 2, which is a contradiction.
So dfree > 4. As wt(1 + D + D 2 , 1 + D 2 ) = 5, dfree = 5. Thus C 1 is a (2, 1, 2, 5) code.
Exercise 823 Let C 5 be the (2, 1) convolutional code with generator matrix
G 5 = [1
1 + D].
Show that the free distance of C 5 is 3.
Exercise 824 Show that the free distance of the code C 2 from Example 14.1.1 is 4. Hint:
Use the generator matrix G ′2 .
Example 14.4.3 and Exercises 823 and 824 illustrate the difficulty in finding the free
distance of an (n, k) convolutional code even for very small n and k. There is, however,
an upper bound on dfree , which we give in Theorem 14.4.10. For this it is helpful to give
another way to generate a certain part of the code.
Let G be a fixed k × n generator matrix of an (n, k) convolutional code C where the
maximum row degree of G is M, the memory of the encoder G. There are M + 1 k × n
binary matrices B0 , B1 , . . . , B M such that G = B0 + B1 D + B2 D 2 + · · · + B M D M . Form
564
Convolutional codes
the binary matrix B(G), with an infinite number of rows and columns, by
B0 B1 B2 · · · B M
B0 B1 B2 · · · B M
B(G) =
,
B0 B1 B2 · · · B M
..
..
..
.
.
.
where blank entries are all 0. Similarly, for x a k-tuple of polynomials with maximum
degree N , there are (N + 1) k × 1 binary matrices X 0 , X 1 , . . . , X N such that x = X 0 +
X 1 D + · · · + X N D N. Let b(x) be the binary vector of length k(N + 1) formed by interleaving (X 0 , X 1 , . . . , X N ). The codeword c = xG when interleaved is precisely b(x)B(G).
Also the weight of c is the Hamming weight of b(x)B(G).
Example 14.4.4 Using the encoder G 1 from Example 14.1.1 for the (2, 1) code C 1 , we
have B0 = [1 1], B1 = [1 0], and B2 = [1 1]. So,
1 1 1 0 1 1
1 1 1 0 1 1
1 1 1 0 1 1
1 1 1 0 1 1
.
B(G 1 ) =
1 1 1 0 1 1
1
1
1
0
1
1
..
.
Let x = 1 + D + D 3 + D 5 . Writing this in binary, interleaving being unnecessary as k = 1,
gives b(x) = 110101. Computing b(x)B(G 1 ) yields the weight 8 vector 1101010010001011,
which corresponds to xG 1 = (1 + D 4 + D 6 + D 7, 1 + D + D 2 + D 7 ) interleaved, having
weight 8. This agrees with the results of Example 14.1.2.
Exercise 825 Using the encoder G 2 from Example 14.1.1 for the (4, 2) code C 2 , do the
following:
(a) Find B0 , B1 , and B2 .
(b) Give the first ten rows of B(G 2 ).
(c) Suppose x = (1 + D + D 3, 1 + D 2 + D 3 + D 4 ). What is b(x)?
(d) Compute b(x)B(G 2 ) and check your result with Example 14.1.3.
Exercise 826 Using the encoder G 3 from Exercise 822 for the (4, 3) code C 3 , do the
following:
(a) Find B0 and B1 .
(b) Give the first nine rows of B(G 3 ).
(c) Suppose x = (1 + D + D 2 , 1 + D 2 , D). What is b(x)?
(d) Compute b(x)B(G 3 ). From this find xG 3 written as a 4-tuple of polynomials.
Exercise 827 Using the encoder G 4 from Exercise 822 for the (4, 3) code C 4 , do the
following:
(a) Find B0 , B1 , and B2 .
(b) Give the first nine rows of B(G 4 ).
565
14.4 Free distance
(c) Suppose x = (1 + D 2 , D + D 2 , 1 + D + D 2 ). What is b(x)?
(d) Compute b(x)B(G 4 ). From this find xG 4 written as a 4-tuple of polynomials.
Notice that the first block of B(G), that is [B0 B1 · · · B M · · ·] consisting of the first k rows
of B(G), corresponds to the k rows of G interleaved. The next block of k rows corresponds
to the k rows of G multiplied by D and interleaved. In general, the ith block of k rows
corresponds to the k rows of G multiplied by D i−1 and interleaved. Thus the rows of B(G)
span a binary code (of infinite dimension and length) whose vectors correspond to the
codewords of C arising from polynomial inputs. If G is basic, then all polynomial outputs
arise from polynomial inputs by Theorem 14.3.3. This proves the following result.
Theorem 14.4.5 Let G be a basic generator matrix for an (n, k) convolutional code. Then
every codeword with polynomial components (interleaved) corresponds to a finite linear
combination of the rows of B(G).
The free distance of C is the weight of some finite weight codeword. Such codewords
have components of the form p(D)/D i for some polynomial p(D) and nonnegative integer
i by Lemma 14.4.2. Multiplying such a codeword by a high enough power of D, which
does not change its weight, gives a codeword with polynomial components having weight
equal to the free distance. By Theorem 14.4.5, if G is a basic generator matrix of C, the
free distance of C is the weight of some finite linear combination of the rows of B(G). This
observation leads to the following notation. Let L be a nonnegative integer; let C (L) be the
set of polynomial codewords of the (n, k) convolutional code C with components of degree
L or less. This set C (L) is in fact a binary linear code of length n(L + 1); shortly we will find
its dimension. We clearly have the following result as some codeword of minimum weight
has polynomial components and hence is in some C (L) .
Theorem 14.4.6 Let C be an (n, k) convolutional code with free distance dfree . Let C (L) have
minimum weight d L . Then dfree = min L≥0 d L . Furthermore, some C (L) contain a codeword
whose weight is dfree .
The codewords of C (L) , interleaved, correspond to the linear combinations of the rows
of B(G) whose support is entirely in the first n(L + 1) columns of B(G), where G is a
basic generator matrix. By Theorem 14.4.6, if we look at the binary code generated by
a big enough “upper left-hand corner” of B(G) where G is basic, we will find a binary
codeword whose minimum weight is the free distance of C. If G is also reduced, and hence
is canonical, then we can in fact find how many rows of B(G) generate the binary code
C (L) (interleaved), as the proof of the next result, due to Forney [88], shows. Let k L be the
dimension of C (L) as a binary code.
Theorem 14.4.7 Let C be an (n, k) convolutional code with Forney indices m 1 , . . . , m k .
Then the binary dimension k L of C (L) satisfies
k
kL =
i=1
max{L + 1 − m i , 0}.
Proof: Let G be a canonical polynomial generator matrix for C with rows g1 (D), . . . , gk (D).
Order the rows so that deg gi (D) = m i where m 1 ≤ m 2 ≤ · · · ≤ m k . Let c(D) be
566
Convolutional codes
any polynomial codeword with components of degree L or less. As G is basic, by
Theorem 14.3.3(iii), there is a polynomial k-tuple x(D) = (x1 (D), . . . , xk (D)) with
c(D) = x(D)G. As G is reduced, by Theorem 14.3.4(ii) deg c(D) = max1≤i≤k (deg xi (D) +
deg gi (D)). In particular, deg xi (D) + deg gi (D) = deg xi (D) + m i ≤ L, implying that the
set B L = {D j gi (D) | 1 ≤ i ≤ k, j ≥ 0, j + m i ≤ L} spans the binary code C (L) . Furthermore, as {gi (D) | 1 ≤ i ≤ k} is linearly independent over F2 (D), B L is linearly independent
over F2 . Therefore B L is a binary basis of C (L) . For each i, j ranges between 0 and L − m i
as long as the latter is nonnegative. The result follows.
The result of this theorem shows how many rows of B(G) we need to generate the binary
code C (L) (in interleaved form); the proof indicates which rows of B(G) actually to take.
This requires G to be canonical.
Example 14.4.8 Consider the (2, 1) convolutional code C 5 from Exercise 823 with generator matrix G 5 = [1 1 + D]. Multiplying G 5 by 1 + D, we obtain the generator
matrix G ′5 = [1 + D 1 + D 2 ], which is not canonical by Exercise 819. Using G ′5 , we
have B0 = [1 1], B1 = [1 0], and B2 = [0 1]. Thus
1 1 1 0 0 1
1 1 1 0 0 1
′
1
1
1
0
0
1
B(G 5 ) =
.
1 1 1 0 0 1
..
.
Notice that the codeword (1/(1 + D))G ′5 = (1, 1 + D) has weight 3 and is contained in
(1)
′
′
C (1)
5 . So we cannot use B(G 5 ) to find all of C 5 . This is because G 5 is not canonical. Notice
also that there is no polynomial codeword of weight 3 spanned by rows of B(G ′5 ), as the
rows all have even weight. The free distance of C 5 is 3, as Exercise 823 shows. So we cannot
use B(G ′5 ) to discover the free distance of C 5 . However, the encoder G 5 is canonical by
Exercise 819. If we form B(G 5 ), we obtain
1 1 0 1
1 1 0 1
1
1
0
1
B(G 5 ) =
.
1 1 0 1
..
.
The Forney index of C 5 is m 1 = 1. The code C (L)
5 has length 2(L + 1), binary dimension
k L = L + 1 − m 1 = L, and minimum distance 3 for all L ≥ 1. The interleaved codewords
of C 5(L) have a binary basis consisting of the first L rows of B(G 5 ).
Example 14.4.9 The generator matrix G ′′2 is a canonical generator matrix of C 2 from
Example 14.1.1. The Forney indices are m 1 = 0 and m 2 = 1; also
B0 =
1
0
1
1
1
0
1
1
and B1 =
0
0
0
1
0
1
0
,
0
567
14.4 Free distance
giving
1
0
′′
B(G 2 ) =
1
1
1
0
1
1
0 0 0 0
0 1 1 0
1 1 1 1 0 0 0
0 1 0 1 0 1 1
1 1 1
0 1 0
0
0
1 0 0 0 0
1 0 1 1 0
..
.
.
By Theorem 14.4.7, k L = max{L + 1 − m 1 , 0} + max{L + 1 − m 2 , 0} = max{L +
1, 0} + max{L , 0}. So k L = 2L + 1 for L ≥ 0. Thus C (L)
2 , as a binary code, is a [4(L +
1), 2L + 1] code; we can see that a basis of the interleaved version of C 2(L) is the first 2L + 1
rows of B(G ′′2 ). As the rows of B(G ′′2 ) are of even weight, C (L)
2 , as a binary code, is an even
code. It is not difficult to see that the span of the first 2L + 1 rows of B(G ′′2 ) has no vector
of weight 2. Thus, as a binary code, C 2(L) is a [4(L + 1), 2L + 1, 4] code for all L ≥ 0. By
Theorem 14.4.6, C 2 has free distance 4; see also Exercise 824.
Exercise 828 In Example 14.4.9, a basis for C 2(L) was found for the code C 2 of Example
14.1.1 using the canonical generator matrix G ′′2 .
′′
(a) By Example 14.4.9, C (2)
2 has dimension 5, and the first five rows of B(G 2 ) form a binary
(2)
basis of C 2 . In Exercise 825(b), the first ten rows of B(G 2 ) were found. Find a binary
basis of C (2)
2 using binary combinations of the first eight rows of B(G 2 ). Note how much
′′
simpler it was to find a basis of C (2)
2 using B(G 2 ).
(b) What property of G 2 allows you to find a basis for C (2)
2 using B(G 2 )? What property of
G 2 is missing that causes the basis to be more difficult to find?
(c) Give the first ten rows of B(G ′′′
2 ). Give a maximum set of independent vectors found as
(2)
′′′
binary combinations of rows of B(G ′′′
2 ) which are in C 2 . What property is G 2 missing
(2)
that causes it not to contain a basis of C 2 ?
Exercise 829 Show that the free distance of the code C 3 from Exercise 822 is 3.
Exercise 830 Show that the free distance of the code C 4 from Exercise 822 is 2.
If we let (n, k) be the largest minimum distance possible for a binary [n, k] code, then
an immediate consequence of Theorem 14.4.6 is the following bound.
Theorem 14.4.10 Let C be an (n, k) convolutional code with Forney indices m 1 , . . . , m k .
Let k L be defined by Theorem 14.4.7. Then
dfree ≤ min (n(L + 1), k L ) .
L≥0
We can use bounds, such as those in Chapter 2, or tables, such as those found in [32], to
give values of (n(L + 1), k L ).
Example 14.4.11 A (4, 3, 2) convolutional code has Forney indices that sum to m = 2.
There are two possibilities: m 1 = 0, m 2 = m 3 = 1 or m 1 = m 2 = 0, m 3 = 2.
568
Convolutional codes
Consider first the case m 1 = 0, m 2 = m 3 = 1. By Theorem 14.4.7, k L = 3L + 1 for all
L. Hence Theorem 14.4.10 shows that
dfree ≤ min{(4, 1), (8, 4), (12, 7), . . . }.
The values (4, 1) = (8, 4) = (12, 7) = 4 can be found in [32] or deduced from
Table 2.1. Thus dfree is at most 4. In fact it can be shown that dfree is at most 3 for this
code; see [235, Theorem 7.10]. C 3 from Exercise 822 is a (4, 3, 2, 3) code with these Forney indices; see Exercise 829.
Next consider the case m 1 = m 2 = 0, m 3 = 2. By Theorem 14.4.7, k0 = 2 and k L =
3L + 1 for all L ≥ 1. So,
dfree ≤ min{(4, 2), (8, 4), (12, 7), . . . }.
Since (4, 2) is easily seen to be 2, dfree ≤ 2. The code C 4 from Exercise 822 is a (4, 3, 2, 2)
code with these Forney indices; see Exercise 830.
Section 4 of [235] contains tables showing bounds on the free distance of (n, k, m) codes
using a slightly weaker bound similar to that of Theorem 14.4.10 for (n, k) = (2, 1), (3, 1),
(3, 2), (4, 1), and (4, 3) with values of m up to 10. These tables also give codes that come
closest to meeting these bounds. The bounds are most often met; when not met, except for
one case, the codes have free distance one less than the value from the bound.
Exercise 831 Let C be a (3, 2, 3) convolutional code.
(a) Find the two possible sets of Forney indices for C.
(b) Find the values of k L for each set of Forney indices found in (a).
(c) Give an upper bound on the free distance of C for each set of Forney indices found
in (a).
14.5
Catastrophic encoders
When choosing an encoder for a convolutional code, there is one class of generator matrices
that must be avoided – the so-called catastrophic encoders. In this section we see the problem
that arises with their use.
We digress a bit before discussing catastrophic encoders. Let G be a k × n matrix over
a field F with k ≤ n. Recall that a right inverse of G is an n × k matrix K with entries in
F such that G K = Ik . By a result of linear algebra G has a right inverse if and only if it
has rank k. In general, if k < n, there are many possible right inverses. Now suppose that
G is the generator matrix of either an [n, k] block code or an (n, k) convolutional code.
Then G has rank k and hence has a right inverse K. If x is a message of length k that was
encoded to produce the codeword c = xG, then x can be recovered from c by observing
that x = xIk = xG K = cK; see also Section 1.11.1.
Exercise 832 Consider the binary matrix
G=
1
1
1
0
0
1
1
1
1
.
0
569
14.5 Catastrophic encoders
(a) Find two 5 × 2 binary right inverses of G. Hint: Most of the entries can be chosen to
be 0.
(b) Suppose that G is the generator matrix for a [5, 2] binary code. Suppose that a message
x ∈ F22 is encoded by xG = c. Use each of the right inverses found in part (a) to find
the message that was encoded to produce 01101.
Exercise 833 Do the following:
(a) Find a 2 × 1 right inverse for the generator matrix G 1 of the (2, 1) convolutional code
C 1 from Example 14.1.1 where one of the entries of the right inverse is 0. Note: The
other entry will be a rational function.
(b) Find a second right inverse for G 1 where both entries are polynomials in D. Note: This
can be done with two polynomials of degree 1.
(c) Suppose that x is encoded so that xG 1 = (1 + D + D 5 , 1 + D 3 + D 4 + D 5 ). Use the
two right inverses of G 1 found in parts (a) and (b) to compute x.
We now return to our discussion of catastrophic encoders. Let x be a message to be
encoded using a generator matrix G for the (n, k) convolutional code C. Let K be a right
inverse for G with entries in F2 (D). The resulting codeword to be transmitted is c = xG.
During transmission, c may be altered by noise with the result that y is received. A decoder
finds a codeword c′ that is close to y, which hopefully actually equals c. One is less interested
in c and more interested in x. Thus x′ = c′ K can be computed with the hope that x′ actually
equals x; this will certainly be the case if c′ = c but not otherwise. Let us investigate what
might happen if c′ = c. Let ec = c′ − c denote the codeword error and ex = x′ − x denote
the message error. Note that ec , being the difference of two codewords of C, must be
a codeword of C. The “message” obtained from the codeword ec is ec K = (c′ − c)K =
x′ − x = ex . One expects that the number of erroneous symbols in the estimate c′ would
have a reasonable connection to the number of erroneous symbols in x′ . In particular, one
would expect that if c′ and c differ in a finite number of places, then x′ and x should also
differ in a finite number of places. In other words, if ec has finite weight, ex should also. If ec
were to have finite weight and ex were to have infinite weight, that would be a “catastrophe.”
Recalling that ec is a codeword, we give the following definition. A generator matrix G
for an (n, k) convolutional code is called catastrophic if there is an infinite weight message
x ∈ F2 (D)k such that c = xG has finite weight. Otherwise, G is called noncatastrophic.
Example 14.5.1 The encoder G ′1 of the (2, 1) code C 1 from Example 14.1.1 is catastrophic
as we now show. One right inverse of G ′1 is
D
K = 1 + D .
1
The code C 1 contains the codeword c = (1 + D + D 2 , 1 + D 2 ) (from the generator
matrix G 1 ), whose interleaved binary form is 111011. This codeword has finite weight
5. Computing cK we obtain
570
Convolutional codes
x = cK = (1 + D + D 2 )
D
1
+ 1 + D2 =
=
1+ D
1+ D
∞
Di .
i=0
Thus x has infinite weight, while c has finite weight. In other words, if an infinite
string of 1s is input into the encoder G ′1 , the output is the codeword 1110110000 · · · of
weight 5.
Exercise 834 Confirm the results of Example 14.5.1 by showing that if an infinite string
of 1s is input into the physical encoder for G ′1 from Exercise 803, the output is the weight
5 codeword 1110110000 · · · .
Let us examine Example 14.5.1 more closely to see one reason why catastrophic encoders are to be avoided. The input x consisting of an infinite string of 1s has output
c = 1110110000 · · · of weight 5. Clearly, the input x′ consisting of an infinite string of 0s
has output c′ that is an infinite string of 0s. Suppose that five errors are made and that c′
is received as 1110110000 · · · . This would be decoded as if no errors were made and the
message sent would be determined to be x, the infinite string of 1s. This differs from the
actual message x′ in every position, certainly a very undesirable result!
Fortunately, a theorem of Massey and Sain [229] makes it possible to decide if an encoder
is catastrophic. Their result gives two equivalent properties for a generator matrix G of an
(n, k) convolutional code to be noncatastrophic. A matrix with entries in F2 (D) is a finite
weight matrix provided all its entries have finite weight. Recall from Lemma 14.4.2 that
each entry of a finite weight matrix is a polynomial or a polynomial divided by a positive
integer power of D.
Theorem 14.5.2 (Massey–Sain) Let G be a polynomial generator matrix for an (n, k)
convolutional code C. The matrix G is a noncatastrophic encoder for C if and only if either
of the following holds:
(i) The greatest common divisor of the k × k minors of G is a power of D.
(ii) G has a finite weight right inverse.
We omit the proof of the Massey–Sain Theorem; see [229, 235]. Notice the clear connection between parts (i) and (ii) of this theorem and the parts (i) and (ii) of Theorem 14.3.3.
In particular, a basic generator matrix and, more particularly, a canonical generator matrix,
which we know exists for every code, is noncatastrophic.
Corollary 14.5.3 Every basic generator matrix is a noncatastrophic generator matrix
for a convolutional code. Every convolutional code has a noncatastrophic generator
matrix.
Example 14.5.4 In Example 14.1.1, we give two generator matrices, G 1 and G ′1 , of C 1 .
In Example 14.5.1, we exhibited an infinite weight input with a finite weight output using
the generator matrix G ′1 . So G ′1 is catastrophic. We can obtain this same result by applying
the Massey–Sain Theorem. The two minors of G ′1 are 1 + D 3 = (1 + D)(1 + D + D 2 )
and 1 + D + D 2 + D 3 = (1 + D)(1 + D 2 ); the greatest common divisor of these minors
is 1 + D, which is not a power of D. (Notice the connection between the finite weight input
571
14.5 Catastrophic encoders
that leads to an infinite weight codeword and the greatest common divisor 1 + D of the
minors.) Alternately, suppose that K = [a(D) b(D)]T is a finite weight right inverse of G ′1 .
Then there exist polynomials p(D) and q(D) together with nonnegative integers i and j
such that a(D) = p(D)/D i and b(D) = q(D)/D j . Then as G ′1 K = I1 , (1 + D 3 )a(D) +
(1 + D + D 2 + D 3 )b(D) = 1, or
D j (1 + D 3 ) p(D) + D i (1 + D + D 2 + D 3 )q(D) = D i+ j .
But the left-hand side is a polynomial divisible by 1 + D, while the right-hand side is not.
Thus G ′1 does not have a finite weight right inverse. In Example 14.5.1 we found a right
inverse of G ′1 , but it was not of finite weight.
Turning to the encoder G 1 , we see that the two minors of G 1 are 1 + D + D 2 and 1 + D 2 ,
which have greatest common divisor 1. Thus G 1 is noncatastrophic. Also G 1 has the finite
weight right inverse
K =
D
,
1+ D
also confirming that G 1 is noncatastrophic.
Example 14.5.5 In Example 14.1.1, we give three generator matrices for the (4, 2) convolutional code C 2 . All of these encoders are noncatastrophic. We leave the verification of this
fact for G ′2 and G ′′2 as an exercise. The minor obtained from the first and second columns
of G 2 is 1 + D, while the minor from the first and third columns of G 2 is D. Hence the
greatest common divisor of all six minors of G 2 must be 1.
Exercise 835 Verify that the encoders G ′2 and G ′′2 for the (4, 2) convolutional code C 2 from
Example 14.1.1 are noncatastrophic by showing that the greatest common divisor of the
minors of each encoder is 1.
Exercise 836 Find 4 × 2 finite weight right inverses for each of the encoders G 2 , G ′2 , and
G ′′2 of the (4, 2) convolutional code C 2 from Example 14.1.1. Note: Many entries can be
chosen to be 0.
Exercise 837 Let G ′′′
2 be the generator matrix of the (4, 2) convolutional code C 2 in
Example 14.1.1.
2
(a) Show that the greatest common divisor of the six minors of G ′′′
2 is 1 + D + D and
′′′
hence that G 2 is a catastrophic encoder.
(b) Show that the input
D2
1
x=
,
1 + D + D2 1 + D + D2
′′′
to the encoder G ′′′
2 has infinite weight but that the encoded codeword c = xG 2 has finite
weight. Also give c.
Exercise 838 Let C 5 be the (2, 1) convolutional code from Exercise 823 and Example 14.4.8
with generator matrices
,
,
G 5 = 1 1 + D and G ′5 = 1 + D 1 + D 2 .
572
Convolutional codes
(a) Suppose that G ′5 is the encoder. Find a right inverse of G ′5 and use it to produce the
input if the output is the codeword (1, 1 + D).
(b) Draw the state diagram for the encoder G ′5 .
(c) Draw the trellis diagram for the encoder G ′5 .
(d) Apply the Viterbi Algorithm to decode the received output vector 1101000000 · · · (of
infinite length). Note that this is the interleaved codeword (1, 1 + D). Note also that to
apply the Viterbi Algorithm to a received vector of infinite length, follow the path of
minimum weight through the trellis without trying to end in the zero state.
(e) Based on parts (a) or (d), why is G ′5 a catastrophic encoder?
(f ) What is the greatest common divisor of the 1 × 1 minors of G ′5 ?
(g) Without resorting to the Massey–Sain Theorem, show that G ′5 does not have a finite
weight inverse.
(h) Show that G 5 is a noncatastrophic generator matrix for C.
15
Soft decision and iterative decoding
The decoding algorithms that we have considered to this point have all been hard decision
algorithms. A hard decision decoder is one which accepts hard values (for example 0s
or 1s if the data is binary) from the channel that are used to create what is hopefully the
original codeword. Thus a hard decision decoder is characterized by “hard input” and “hard
output.” In contrast, a soft decision decoder will generally accept “soft input” from the
channel while producing “hard output” estimates of the correct symbols. As we will see
later, the “soft input” can be estimates, based on probabilities, of the received symbols. In
our later discussion of turbo codes, we will see that turbo decoding uses two “soft input,
soft output” decoders that pass “soft” information back and forth in an iterative manner
between themselves. After a certain number of iterations, the turbo decoder produces a
“hard estimate” of the correct transmitted symbols.
15.1
Additive white Gaussian noise
In order to understand soft decision decoding, it is helpful to take a closer look first at
the communication channel presented in Figure 1.1. Our description relies heavily on the
presentation in [158, Chapter 1]. The box in that figure labeled “Channel” is more accurately
described as consisting of three components: a modulator, a waveform channel, and a
demodulator; see Figure 15.1. For simplicity we restrict ourselves to binary data. Suppose
that we transmit the binary codeword c = c1 · · · cn . The modulator converts each bit to a
waveform. There are many modulation schemes used. One common scheme is called binary
phase-shift keying (BPSK), which works as follows. Each bit is converted to a waveform
whose duration is T seconds, the length of the clock cycle. If the bit 1 is to be transmitted
beginning at time t = 0, the waveform is
2
2E s
cos ωt
b1 (t) =
T
0
if 0 ≤ t < T,
otherwise,
where E s is the energy of the signal and ω = 2π /T is the angular frequency. To
transmit the bit 0 the waveform b0 (t) = −b1 (t) is used. (Notice that b0 (t) also equals
√
2E s /T cos(ωt + π), thus indicating why “phase-shift” is included in the name of
the scheme.) Beginning at time t = 0, the codeword c is therefore transmitted as the
574
Soft decision and iterative decoding
✲
✲
Modulator
c = c1 · · · c n
codeword
Waveform
channel
✲ Demodulator
y(t) = c(t) + e(t)
received
waveform
c(t)
codeword
waveform
✲
y
received
vector
e(t)
error from
noise
Figure 15.1 Modulated communication channel.
waveform
n
c(t) =
i=1
bci (t − (i − 1)T ).
Exercise 839 Starting at time t = 0, graph the waveform for:
(a) the single bit 1,
(b) the single bit 0, and
(c) the codeword 00101.
The waveform c(t) is transmitted over the waveform channel where noise e(t) in the form
of another wave may distort c(t). By using a matched filter with impulse response
2
2
cos ωt if 0 ≤ t < T,
h(t) =
T
0
otherwise,
the demodulator first converts the received waveform y(t) = c(t) + e(t) into an n-tuple of
real numbers y′ = y1′ y2′ · · · yn′ where
D iT
y(t)h(i T − t) dt.
(15.1)
yi′ =
(i−1)T
√
√
If no error occurs in the transmission of ci , then yi′ = E s if ci = 1 and yi′ = − E s
if ci = 0; see Exercise 840. We can produce a hard decision received binary vector y =
c + e = y1 y2 · · · yn by choosing
1 if yi′ > 0,
(15.2)
yi =
0 if yi′ ≤ 0.
These binary values yi would be the final output of the demodulator if our decoder is a hard
decision decoder.
Example 15.1.1 If the codeword c = 10110 is sent, the transmitted waveform is c(t) =
b1 (t) − b1 (t − T ) + b1 (t − 2T ) + b1 (t − 3T ) − b1 (t − 4T ). Suppose y(t) = c(t) + e(t) is
received. Assume that the matched filter yields y1′ = 0.7, y2′ = −1.1, y3′ = 0.5, y4′ = −0.1,
575
15.1 Additive white Gaussian noise
and y5′ = −0.5. The hard decision received vector is y = 10100 and an error has occurred
in the fourth coordinate.
Exercise 840 Show that
2
D iT 2
*
2E s
2
cos ω (t − (i − 1)T )
cos ω(i T − t) dt = E s .
T
T
(i−1)T
Explain why this verifies the claim that if no error occurs during the transmission of a single
√
bit ci , then yi′ = ± E s , where + is chosen if ci = 1 and − is chosen if ci = 0.
The noise can often be modeled as additive white Gaussian noise (AWGN). This model
describes noise in terms of the distribution of yi′ given by (15.1). If ci = 1 is transmitted,
√
√
the mean µ of yi′ is E s . If ci = 0 is transmitted, the mean µ is − E s . Subtracting the
appropriate value of µ, yi′ − µ is normally distributed with mean 0 and variance
N0
.
2
The value N0 /2 is called the two-sided power spectral density.
Using (15.1) and (15.2) to produce a binary vector from y(t) is in fact a realization of
the binary symmetric channel model described in Section 1.11.2. The crossover probability
̺ of this channel is computed as follows. Suppose, by symmetry, that the bit ci = 1 and
hence b1 (t − (i − 1)T ) is transmitted. An error occurs if yi′ ≤ 0 and the received bit would
be declared to be 0 by (15.2). Since yi′ − µ is normally distributed with mean 0 and variance
σ 2 = N0 /2,
D 0
D −√2Es /N0
−(y−µ)2
1
1
x2
′
(15.3)
e− 2 d x.
̺ = prob(yi ≤ 0) = √
e 2σ 2 dy = √
2π σ −∞
2π −∞
σ2 =
Exercise 841 Do the following:
(a) Verify the right-most equality in (15.3).
(b) Verify that
D −√2Es /N0
*
1
1 1
x2
e− 2 d x = − erf( E s /N0 ),
√
2 2
2π −∞
where
2
erf(z) = √
π
D
z
2
e−t dt.
0
The function erf(z) is called the error function and tabulated values of this function are
available in some textbooks and computer packages.
On the other hand the output of the demodulator does not have to be binary. It can be
“soft” data that can be used by a soft decision decoder. Notice that the hard decision value yi
was chosen based on the threshold 0; that is, yi is determined according to whether yi′ > 0
or yi′ ≤ 0. This choice does not take into account how likely or unlikely the value of yi′ is,
given whether 0 or 1 was the transmitted bit ci . The determination of yi from yi′ is called
quantization. The fact that there are only two choices for yi means that we have employed
binary quantization.
576
Soft decision and iterative decoding
♣s
♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣
♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣ ♣♣♣ ♣♣♣ ♣♣♣♣♣♣♣♣♣♣ s
♣
♣
♣
♣
♣
♣
♣
♣
♣
♣
♣
♣
♣
♣
♣
♣
♣
♣ ♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣ ♣ ♣♣♣
0 ♣s♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣ s
♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣ ♣ ♣ ♣♣♣♣♣♣ ♣
♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣ ♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣
♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣s
♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣ ♣♣♣♣♣ ♣♣♣♣♣ ♣♣ ♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣ ♣♣
♣ ♣♣♣♣♣♣♣♣♣♣♣ ♣ ♣♣♣♣ ♣♣♣♣ ♣♣ ♣♣♣♣♣ ♣♣ ♣♣♣♣♣♣♣ ♣♣ ♣♣♣♣♣♣♣♣♣
♣
♣
♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣s
♣
♣
♣
♣
♣
♣
♣
♣♣♣
♣ ♣ ♣♣ ♣♣♣ ♣ ♣♣ ♣♣♣♣♣ ♣
♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣ ♣♣♣♣♣♣ ♣♣♣♣♣♣ ♣♣♣♣♣♣ ♣
♣♣s♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣s
♣ ♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣
1
♣
♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣
♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣ ♣♣♣♣♣♣♣♣♣ s
♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣♣
s
04
03
02
01
11
12
13
14
Figure 15.2 Binary input, 8-ary output discrete memoryless channel.
We could have other quantizations that determine the output of the demodulator. One
example, which will become the basis of our soft decision decoding, is the following.
Suppose we divide the real line into a disjoint union of eight intervals by choosing seven
threshold values. Then we could assign yi any one of eight values. We will use 01 , 02 , 03 ,
04 , 11 , 12 , 13 , 14 for these eight values. For example, suppose that (a, b] (where a = −∞
or b = ∞ are possible) is the interval that determines 01 ; in other words, suppose that yi is
chosen to be 01 exactly when yi′ ∈ (a, b]. Then the probability that yi′ ∈ (a, b], given that
ci is transmitted, is
D b
−(y−µ)2
1
prob(01 | ci ) = prob(a < yi′ ≤ b | ci ) = √
(15.4)
e 2σ 2 dy,
2π σ a
√
√
where µ = E s if ci = 1 and µ = − E s if ci = 0. Therefore once the threshold levels are
chosen, using equations similar to (15.4), we can calculate 16 probabilities prob(y | c) where
y ranges through the eight possible values 01 , . . . ,14 and c = 0 or 1. These probabilities
are characteristic of the channel under the assumption that the channel is subject to AWGN;
they are the channel statistics. The probabilities can be computed from the knowledge of the
signal energy E s and the two-sided power spectral density N0 /2 recalling that σ 2 = N0 /2.
The channel we have just described is a binary input, 8-ary output discrete memoryless
channel (DMC) and is pictured in Figure 15.2. (The “binary input” is the input to the
modulator, and the “8-ary output” is the output of the demodulator.) Along each edge we
could write the probability prob(y | c) as in the BSC of Figure 1.2. Of course the sum of the
eight probabilities along the edges emanating from 0 add to 1 as do the eight probabilities
along the edges emanating from 1. Presumably we would choose our notation so that the
four highest probabilities emanating from 0 would terminate at 01 , . . . , 04 and the four
highest probabilities emanating from 1 would terminate at 11 , . . . ,14 . At some later point
in time, the four values 01 , . . . , 04 could be assigned 0 and the four values 11 , . . . ,14 could
be assigned 1.
By going to this 8-level quantization we are taking into account the likelihood of getting
a value yi′ given that ci is sent. It is not surprising this “soft” quantized data together
with the probabilities prob(y | c) can be used to create a more effective decoder. We can
actually quantify how much “gain” there is when using coding with hard decision decoding
compared with using no coding at all. Furthermore, we can measure the potential “gain”
when soft decision decoding is used rather than hard decision decoding.
577
15.1 Additive white Gaussian noise
In Section 1.11.2 we presented Shannon’s Theorem for a BSC. There is a corresponding
theorem for the Gaussian channel if we give the appropriate channel capacity. Assume
the Gaussian channel is subject to AWGN with two-sided spectral density N0 /2. Suppose,
further, that the signaling power is S and bandwidth is W . Then Shannon showed that if the
capacity C G (W ) of the channel is defined to be
S
C G (W ) = W log2 1 +
,
(15.5)
N0 W
then arbitrarily reliable communication is possible provided the rate of information transmission is below capacity and impossible for rates above C G (W ).
If we allow unlimited bandwidth, then the channel capacity approaches C G∞ , where
S
S
∞
bits/s.
(15.6)
C G = lim W log2 1 +
=
W →∞
N0 W
N0 ln 2
Earlier we let T be the time to transmit a single bit. So k information bits can be transmitted
in τ = nT seconds. The energy per bit, denoted E b , is then
Eb =
Sτ
S
=
,
k
Rt
(15.7)
where Rt = k/τ is the rate of information transmission. Combining this with (15.6), we
see that
C G∞
Eb
.
(15.8)
=
Rt
N0 ln 2
Shannon’s Theorem tells us that to have communication over this channel, then Rt < C G∞ .1
Using (15.8), this implies that
Eb
> ln 2 ≈ −1.6 dB.
N0
(15.9)
Thus to have reliable communication over a Gaussian channel, the signal-to-noise ratio
E b /N0 , which measures the relative magnitude of the energy per bit and the noise energy,
must exceed the value −1.6 dB, called the Shannon limit.2 As long as the signal-to-noise
ratio exceeds Shannon’s limit, then Shannon’s Theorem guarantees that there exists a communication system, possibly very complex, which can be used for reliable communication
over the channel.
Exercise 842 Verify the right-most equality in (15.6).
If we require that hard decision decoding be used, then the channel has a different capacity.
To have reliable communication with hard decision decoding, we must have
π
Eb
> ln 2 ≈ 0.4 dB.
N0
2
1
2
Capacity of a channel can be defined in more than one way. For the version defined here, Shannon’s Theorem
examines the connection between the information transmission rate Rt and the channel capacity. In the version
of Shannon’s Theorem for binary symmetric channels presented in Section 1.11.2, channel capacity was defined
in the way that connects the information rate R = k/n of an [n, k] code to the channel capacity.
A unitless quantity x, when converted to decibels, becomes 10 log10 (x) dB.
578
Soft decision and iterative decoding
Thus potentially there is a 2 dB gain if we use soft decision decoding rather than hard
decision decoding over a Gaussian channel. Shortly we explain what such a gain will yield.
First consider what happens when we do not use coding to communicate over a channel.
Let Pb be the bit error rate (BER), that is, the probability of a bit error in the decoding. If
there is no coding, then E b = E s since every signal is a single information bit. From (15.3),
the BER of uncoded binary phase-shift keying is
*
Pb = Q( 2E b /N0 ),
where
1
Q(z) = √
2π
D
∞
x2
e− 2 d x.
z
In Figure 15.3, we plot Pb for uncoded BPSK as a function of signal-to-noise ratio. (Note
that the vertical scale is logarithmic, base 10.) We also indicate in the figure the Shannon
limit of −1.6 dB and the hard decision limit of 0.4 dB.
As an illustration of what Figure 15.3 tells us, the acceptable bit error rate for image data
transmitted by deep space satellites is typically in the range of 5 × 10−3 to 1 × 10−7 [352].
Suppose that we desire a BER of 1 × 10−5 . We see from Figure 15.3 that if we transmitted
using BPSK without any coding, then the signal-to-noise ratio must be at least 9.6 dB. Thus
with coding using hard decision decoding, there is a potential coding gain of 9.2 dB. With
coding using soft decision decoding, there is a potential coding gain of 11.2 dB.
This improvement can be measured in several different ways. For instance, if we examine
(15.7) and assume that N0 , k, and τ remain fixed, a 1 dB gain means that the signal power S
can be reduced to about 10−0.1 S ≈ 0.794S. If the BER is 1 × 10−5 and if we could achieve
a 9.2 dB coding gain by using a code C, then the signal power required by uncoded BPSK
could be reduced to about 10−0.92 S ≈ 0.120S with the use of C. Such a reduction in signal
power is very significant. In deep space satellite communication this can mean smaller
batteries (and thus less weight), longer operation of the spacecraft, or smaller transmitter
size. In the late 1960s each dB of coding gain in satellite communications was estimated to
be worth US$1,000,000 in development and launch costs [227].
We can analyze potential coding gain when the channel is bandwidth limited using a code
of information rate R. Assume communication uses an [n, k] binary code, which therefore
has information rate R = k/n. If we are sampling at the Nyquist rate of 2W samples per
second and transmitting k information bits in τ seconds, then n = 2W τ . Therefore the rate
Rt of information transmission is
Rt = k/τ = 2W k/n = 2W R.
(15.10)
From (15.7) and (15.10),
2R E b
S
=
.
N0 W
N0
Using this equation, (15.5), and Shannon’s Theorem, to have reliable communication
2R E b
.
Rt = 2W R < W log2 1 +
N0
579
15.1 Additive white Gaussian noise
10−1
10−2
10−3
Uncoded BPSK
Pb
(BER)
R = 1/6 (−1.08 dB)
10−4
R = 1/3 (−0.55 dB)
R = 1/2 (0 dB)
R = 3/4 (0.86 dB)
9.6 dB
10−5
Shannon limit (−1.6 dB)
Hard decision limit (0.4 dB)
2.0 dB
10−6
−2
0
2
4
6
Eb /N0 (dB)
8
10
12
Figure 15.3 Potential coding gain in a Gaussian channel.
Solving for E b /N0 yields
Eb
22R − 1
,
>
N0
2R
(15.11)
giving the potential coding gain for a code of information rate R.
Notice that if the information transmission rate Rt is held constant, then by (15.10),
when W → ∞, R → 0. Letting R → 0 in (15.11), we obtain E b /N0 > ln 2, agreeing with
(15.9) in the case where the channel has unlimited bandwidth. Suppose that we keep the
bandwidth fixed as when the channel is bandwidth limited. The right-hand side of (15.11)
is an increasing function of R. Hence to communicate near the Shannon limit, R must be
close to 0; by (15.10), the information transmission rate Rt must also be close to 0. The
580
Soft decision and iterative decoding
following table illustrates the lower bound from (15.11), in dB, on signal-to-noise ratio for
a variety of values of R.
R
3/4
22R − 1
(dB) 0.86
2R
1/2
1/3
1/6
(15.12)
0
−0.55
−1.08
We include these values in Figure 15.3.3
Exercise 843 Verify the entries in (15.12).
15.2
A Soft Decision Viterbi Algorithm
In this section we examine how we might modify the Viterbi Algorithm (a hard decision
decoding algorithm) of Section 14.2 to obtain a soft decision Viterbi decoding algorithm.
The Soft Decision Viterbi Algorithm is carried out in precisely the same manner as the hard
decision Viterbi Algorithm once the edge weights in the trellis are appropriately defined,
except the survivor paths are the ones with highest path weights. The hard decision Viterbi
Algorithm is nearest neighbor decoding that, over a binary symmetric channel, is also
maximum likelihood decoding; see the discussion in Section 1.11.2. The Soft Decision
Viterbi Algorithm will also be maximum likelihood decoding.
Suppose that an input message x(i) = (x1 (i), . . . , xk (i)) for i = 0, 1, . . . , L − 1 is encoded using the generator matrix G of an (n, k) binary convolutional code to produce an
output codeword c(i) = (c1 (i), . . . , cn (i)) for i = 0, 1, . . . , L + M − 1. As previously, the
state s of the encoder at time i is denoted si . The zero state will be state a, the initial and
final state of the encoder. We will add M blocks of k zeros to the end of the message so that
the encoder will terminate in the zero state a L+M at time L + M. The bits of this codeword
are interleaved as in (14.1) and this bit stream is modulated, transmitted, and received as a
waveform y(t) that is demodulated using (15.1). Although any demodulation scheme can
be used, we will assume for concreteness that 8-level quantization is used as described in
the previous section. After deinterleaving the quantized data, we have a received vector
y(i) = (y1 (i), . . . , yn (i)) for i = 0, 1, . . . , L + M − 1 where y j (i) ∈ {01 , . . . , 14 }.
Since the Soft Decision Viterbi Algorithm is to perform maximum likelihood decoding,
the algorithm must find the codeword c that maximizes prob(y | c). However, as the channel
is memoryless,
prob(y | c) =
3
L+M−1
n
i=0
j=1
prob(y j (i) | c j (i)).
Equation (15.11) is valid if Pb is close to 0. If we can tolerate a BER of Pb not necessarily close to 0, then we
can compress the data and obtain a modified version of (15.11). Taking this into account, the vertical lines in
Figure 15.3 for the four values of R, for Shannon’s limit, and for the hard decision limit actually curve slightly
toward the left as they rise. So for BERs closer to 10−1 , we can obtain more coding gain than shown. See [158,
Section 1.5] for a discussion and figure; see also [232].
581
15.2 A Soft Decision Viterbi Algorithm
Maximizing this probability is equivalent to maximizing its logarithm
L+M−1 n
ln(prob(y | c)) =
i=0
j=1
ln(prob(y j (i) | c j (i))),
which has the advantage of converting the product to a sum. (Recall that computing
the weight of a path in the original Viterbi Algorithm was accomplished by summing
edge weights.) We can employ a technique of Massey [226] that will allow us to make the
edge weights into integers. Maximizing ln(prob(y | c)) is equivalent to maximizing
L+M−1 n
µ(y j (i), c j (i)),
i=0
(15.13)
j=1
with
µ(y j (i), c j (i)) = A(ln(prob(y j (i) | c j (i))) − f i, j (y j (i))),
(15.14)
where A is a positive constant and f i, j (y j (i)) is an arbitrary function. The value
n
µ(y j (i), c j (i))
(15.15)
j=1
will be the edge weight in the trellis replacing the Hamming distance used in the hard
decision Viterbi Algorithm. To make computations simpler first choose
f i, j (y j (i)) = min ln(prob(y j (i) | c))
c∈{0,1}
and then choose A so that µ(y j (i), c j (i)) is close to a positive integer. With these edge
weights, the path weights can be determined. The Soft Decision Viterbi Algorithm now
proceeds as in Section 14.2, except the surviving paths at a node are those of maximum
weight rather than minimum weight.
Example 15.2.1 We repeat Example 14.2.5 using the Soft Decision Viterbi Algorithm.
Assume the channel is a binary input, 8-ary output DMC subject to AWGN. Suppose
thresholds have been chosen so that prob(y | c) is given by
c\y
0
1
04
03
02
01
11
12
13
14
0.368 0.207 0.169 0.097 0.065 0.051 0.028 0.015
0.015 0.028 0.051 0.065 0.097 0.169 0.207 0.368
Taking the natural logarithm of each probability yields
c\y
0
1
04
03
02
01
11
12
13
14
−0.9997 −1.5750 −1.7779 −2.3330 −2.7334 −2.9759 −3.5756 −4.1997
−4.1997 −3.5756 −2.9759 −2.7334 −2.3330 −1.7779 −1.5750 −0.9997
582
Soft decision and iterative decoding
y = 11 12
s
❆
a = 00
04 11
s
❆
0
s
❆
8
❆
4
❆
c = 10
d = 11
i= 0
04 11
s
❆
0
❆
❆
❆
b = 01
13 14
13
8
❆
1
14 04
s
❆
03 13
s
❆
8
❆
8
5
❆
5
04 12
s
8
3
14 13
s
s ❆ 13 s ❆ 1 s ❆ 8 s ❆ 5 s
s
❅ 0 ❆ ✁ ❅ 8 ❆ ✁ ❅ 8 ❆ ✁❅ 5 ❆ ✁
✁
✁
✁
✁
✁
✁
❆ 5 ❅✁❆ 0 ❅✁❆ 16 ❅✁❆ 0 ❅✁❆ 0 ✁
❆ 0
❆s 8 ✁ ❅❆s 9 ✁ ❅❆s 0 ✁ ❅❆s 10 ✁ ❅❆s 11 ✁
❆s
❅ ✁
❅ ✁
❅ ✁
❅ ✁
❅
✁
✁❅ 8
✁❅ 9
✁❅ 0
✁❅ 10
✁
❅9
✁
✁
✁
✁
✁
❅✁s 5 ❅✁s 0 ❅✁s 16 ❅✁s 0 ❅✁s
❆
0
13
s
1
❆
1
2
3
4
5
6
7
8
Figure 15.4 Edge weights for the Soft Decision Viterbi Algorithm.
By subtracting the smaller value in each column of the following table from each entry in
the column, multiplying the result by A = 2.5, and rounding, we obtain µ(y, c):
c\y
04
03
02
01
11
12
13
14
0
1
8
0
5
0
3
0
1
0
0
1
0
3
0
5
0
8
(We remark that the difference between the actual value and the rounded value in the above
table is never more than 0.005. Also, other choices for A are possible, such as A = 10000.)
Now assume that a six bit message followed by 00 has been encoded and
y = 11 12 04 11 13 14 04 11 14 04 03 13 04 12 14 13
is the received demodulated vector. Note that when 01 , . . . , 04 and 11 , . . . , 14 are merged to
the hard outputs 0 and 1, respectively, we have the same received vector as in Example 14.2.5.
Figure 15.4 is the trellis of Figure 14.4 with the edge labels of that figure replaced by the
weights from (15.15) analogous to the trellis of Figure 14.5. As an illustration, the dashed
edge a0 c1 originally labeled 11 has weight µ(11 , 1) + µ(12 , 1) = 1 + 3 = 4. Figure 15.5
shows the survivor paths and their weights. The survivor path ending at state a when i = 8
is a0 a1 a2 c3 d5 d6 b7 a8 yielding the message 001111 (followed by two 0s) with encoding
0000110110100111. Recall that the message using hard decision Viterbi decoding was
111011 with encoding 1101100100010111. This example shows that hard decision and
soft decision decoding can yield far different results. For comparison the path weight in the
trellis of Figure 15.4 for the message 111011 from the hard decision decoding is 69, one
less than the path weight of 70 for the message 001111.
Exercise 844 Do the following:
(a) Beginning with the values for prob(y | c), verify the values for µ(y, c) given in
Example 15.2.1.
(b) Verify the edge weights shown in Figure 15.4.
583
15.2 A Soft Decision Viterbi Algorithm
a = 00
s
❆
s
❆
❆
c = 10
d = 11
i= 0
s
8
17
s
25
s
35
s
50
s
59
s
70
❆
❆
❆
b = 01
s
❆
0
21
57
56
45
27
s
s
s
s
s
❅
❅
❅
✁
✁
✁
✁
❆
❆
❆
✁
✁
❅ ✁
❅
❅ ✁
❆
❆
✁❆
✁
✁
✁
❅s
❆s
❆s
✁ ❆s
✁ ❅s
✁ ❅s
✁
4
21
50
35
29
1
❅ ✁
❅
✁
✁
✁
✁
✁❅
✁
✁
❅
✁
✁
✁
✁
❅s
❅✁s
✁s
✁s
✁s
❆
1
s ❆
4
13
18
30
46
46
2
3
4
5
6
7
8
Figure 15.5 Survivor paths and their weights in the trellis of Figure 15.4.
(c) Verify that the survivor paths and weights for the Soft Decision Viterbi Algorithm of
Example 15.2.1 are as shown in Figure 15.5.
Exercise 845 Use the Soft Decision Viterbi Algorithm to decode the following received
vectors sent using the code of Example 15.2.1 together with the edge weights determined
by µ(y, c) in that example. (The trellis from Figure 14.4 will be useful.) Draw the trellis
analogous to that of Figure 15.5. Give the most likely message and codeword. Finally,
compare your results to those in Exercise 817.
(a) 14 13 03 02 04 13 12 13 14 12 13 11 12 14 03 14
(b) 03 14 13 02 14 11 03 14 13 02 04 14 13 02 03 12
It is interesting to apply the Soft Decision Viterbi Algorithm to a binary symmetric
channel with crossover probability ̺ instead of the binary input 8-ary output channel. In
the case of the BSC,
prob(y | c) =
=
n
L+M−1
i=0
j=1
L+M−1
n
i=0
j=1
prob(y j (i) | c j (i))
̺d(y j (i),c j (i)) (1 − ̺)1−d(y j (i),c j (i)) .
Therefore from (15.13) and (15.14),
µ(y j (i), c j (i)) = A(ln(̺d(y j (i),c j (i)) (1 − ̺)1−d(y j (i),c j (i)) ) − f i, j (y j (i)))
̺
+ A ln(1 − ̺) − A f i, j (y j (i)).
= d(y j (i), c j (i)) A ln
1−̺
Choosing A = − (ln(̺/(1 − ̺)))−1 , which is positive if ̺ < 1/2, and f i, j (y j (i)) = ln(̺),
we obtain
µ(y j (i), c j (i)) = 1 − d(y j (i), c j (i)),
which implies, by (15.15), that the edge weights in the trellis equal
n − d(y(i), c(i)).
(15.16)
584
Soft decision and iterative decoding
The path of maximum weight through the trellis with these edge weights is exactly the same
path of minimum weight through the trellis with edge weights given by d(y(i), c(i)). The
latter is the outcome of the hard decision Viterbi Algorithm of Section 14.2, and hence the
hard decision and soft decision Viterbi Algorithms agree on a BSC.
Exercise 846 Verify (15.16).
15.3
The General Viterbi Algorithm
The Viterbi Algorithm can be placed in a more general setting that will make its use more
transparent in a number of situations. This general setting, proposed by McEliece in [234],
allows the labels (or weights) of the trellis edges to lie in any semiring.
A semiring is a set A with two binary operations: addition, usually denoted +, and
multiplication, usually denoted ·. These operations must satisfy certain properties. First, + is
an associative and commutative operation; furthermore, there is an additive identity, denoted
0, such that u + 0 = u for all u ∈ A. Second, · is an associative operation; additionally, there
is a multiplicative identity, denoted 1, such that u · 1 = 1 · u = u for all u ∈ A. Finally, the
distributive law (u + v) · w = (u · w) + (v · w) holds for all u, v, and w in A.
Let T be a truncated trellis of a convolutional code; we will use Figure 14.4 as the model
for such a trellis.4 The vertices of the trellis consist of states s at times i, which we continue
to denote by si . An edge e in the trellis beginning in state s at time i − 1 and ending at state
s ′ at time i is denoted e = (si−1 , si′ ); the initial vertex si−1 of e is denoted ∗e and the final
vertex is denoted e∗. The label on the edge e is α(e). For example, the edge in Figure 14.4
from state d at time 4 to state b at time 5 is e = (d4 , b5 ), and so ∗e = d4 and e∗ = b5 ; also
α(e) = 01. The edge labels of T will come from a semiring A with operations + and ·. The
General Viterbi Algorithm, in essence, calculates the “flow” through the trellis. Suppose
that e1 e2 · · · em is a path P in the trellis where e1 , . . . , em are edges satisfying ei ∗ = ∗ei+1
for 1 ≤ i < m. The flow along P is the product
ν(P) = α(e1 ) · α(e2 ) · · · α(em ).
(15.17)
Note that the multiplication in A may not be commutative and hence the order in (15.17)
is important. Now suppose that si and s ′j are two vertices in T with i < j. The flow from si
to s ′j is
ν(si , s ′j ) =
ν(P),
(15.18)
P
where the summation runs through all paths P that start at vertex si and end at vertex s ′j .
Of course the summation uses the addition + of the semiring A; since addition in A is
commutative, the order of the summation is immaterial.
4
A trellis can be defined for a block code that is similar to a truncated trellis of a convolutional code. Such a trellis
will not necessarily have the same number of states at all times i (“time” is actually called “depth” in the trellis
of a block code). However, as with convolutional codes, the trellis of a block code will have a single starting
vertex and a single ending vertex. The results of this section will apply to such trellises as well; see [336] for an
excellent exposition of this topic.
585
15.3 The General Viterbi Algorithm
Example 15.3.1 Let A be the set of nonnegative integers together with the symbol ∞.
A can be made into a semiring. Define the addition operation + on A as follows: if u
and v are in A, then u + v = min{u, v} where min{u, ∞} = u for all u ∈ A. Define the
multiplication operation · on A to be ordinary addition where u · ∞ = ∞ · u = ∞ for
all u ∈ A. In Exercise 847, you are asked to verify that, under these operations, A is a
semiring with additive identity ∞ and multiplicative identity 0. Consider the truncated
trellis determined by a convolutional encoder and a received vector as in Figure 14.5. The
edge labels of that trellis are in the set A. Using the multiplication in A and (15.17), the flow
along a path is the ordinary sum of the integer edge labels; this is exactly what we termed
the “weight” of the path in Section 14.2.3. Consider the flow ν(a0 , si ) from the initial state
a at time 0 to a state s at time i. By definition of addition in A and (15.18), this flow is
the minimum weight of all paths from a0 to si . We called a path that gave this minimum
weight a “survivor,” and so the flow ν(a0 , si ) is the weight of a survivor. The flow ν(a0 , a8 )
in the truncated trellis of Figure 14.5 is therefore the weight of any survivor in S(a, 8) as
determined by the Viterbi Algorithm presented in Section 14.2.3.
Exercise 847 Verify that the set A with operations defined in Example 15.3.1 is a semiring
with additive identity ∞ and multiplicative identity 0.
Example 15.3.2 The set A of nonnegative integers can be made into a semiring under the
following operations. Define · on A to be ordinary integer addition; if u and v are in A,
define + to be u + v = max{u, v}. Exercise 848 shows that, under these operations, A is
a semiring with additive and multiplicative identity both equal to 0. The edge labels of the
truncated trellis of Figure 15.4 are in the set A. As in Example 15.3.1, the flow along a
path in the trellis is the “weight” of the path. Using the definition of addition in A and
(15.18), the flow ν(a0 , si ) is the maximum weight of all paths from a0 to si . A path giving
this maximum weight is a survivor. The flow ν(a0 , a8 ) in the truncated trellis of Figure 15.4
is again the weight of any survivor through the trellis as determined by the Soft Decision
Viterbi Algorithm of Section 15.2.
Exercise 848 Verify that the set A with operations defined in Example 15.3.2 is a semiring
with additive and multiplicative identity both equal to 0.
Exercise 849 Let A be the set of all polynomials in x with integer coefficients. A is a
semiring under ordinary polynomial addition and multiplication. Consider a truncated trellis
of an (n, k) binary convolutional code that is determined by an encoder with memory M
used to encode messages of length L, followed by M blocks of k zeros; see Figure 14.4 for
an example. The label of an edge is a binary n-tuple, which is the output from the encoder as
determined by the state diagram described in Section 14.2. Relabel an edge of the trellis by
x w , where w is the weight of the binary n-tuple originally labeling that edge. For instance
in Figure 14.4, the edge from a0 to c1 labeled 11 is relabeled x 2 , and the edge from a0 to a1
labeled 00 is relabeled 1. Thus this new trellis has edges labeled by elements of A.
(a) In the relabeled trellis, describe what the flow along a path represents.
(b) In the relabeled trellis, if the zero state a at time 0 is a0 and the zero state at time L + M
is a L+M , describe what the flow from a0 to a L+M represents.
(c) Compute the flow from a0 to a8 for the relabeled trellis in Figure 14.4.
586
Soft decision and iterative decoding
Exercise 850 Let B be the set of all binary strings of any finite length including the empty
string ǫ. Let A be the set of all finite subsets of B including the empty set ∅. If u and v
are in A, define u + v to be the ordinary set union of u and v. If u = {u 1 , . . . , u p } and
v = {v1 , . . . , vq } where u i and v j are in B, define u · v = {u i v j | 1 ≤ i ≤ p, 1 ≤ j ≤ q}
where u i v j is the string u i concatenated (or juxtaposed) with the string v j .
(a) If u = {01, 100, 1101} and v = {001, 11}, what is u · v?
(b) Show that A is a semiring with additive identity the empty set ∅ and multiplicative
identity the single element set {ǫ} consisting of the empty string.
Now consider a truncated trellis of an (n, k) binary convolutional code that is determined
by an encoder with memory M used to encode messages of length L, followed by M blocks
of k zeros; see Figure 14.4 for an example. The label of an edge is a binary n-tuple, which is
the output from the encoder as determined by the state diagram described in Section 14.2.
Consider this label to be a single element set from the semiring A.
(c) In this trellis, describe what the flow along a path represents.
(d) In this trellis, if the zero state a at time 0 is a0 and the zero state at time L + M is a L+M ,
describe what the flow from a0 to a L+M represents.
(e) Compute the flow from a0 to a8 for the trellis in Figure 14.4.
(f) What is the connection between the answer in part (e) of this exercise and the answer
in part (c) of Exercise 849?
We are now ready to state the generalization of the Viterbi Algorithm. We assume that
the states of the trellis are in the set S where the zero state a at time 0 is a0 and the final
state at time L + M is a L+M . The General Viterbi Algorithm is as follows:
I. Set ν(a0 , a0 ) = 1.
II. Repeat the following in order for i = 1, 2, . . . , L + M. Compute for each s ∈ S,
ν(a0 , si ) =
e
(15.19)
ν(a0 , ∗e) · α(e),
where the summation ranges over all edges e of T such that e∗ = si .
We claim that equation (15.19) actually computes the flow as defined by equation (15.18).
The two equations clearly agree for i = 1 by Step I as 1 is the multiplicative identity of A.
Inductively assume they agree for time i = I. From equation (15.18),
ν(a0 , s I +1 ) =
ν(P),
P
where the summation ranges over all paths P from a0 to s I +1 with I + 1 edges. But each
such path consists of the first I edges making up a path Ps ′ ending at some state s ′ at time
I along with the final edge e where ∗e = s I′ and e∗ = s I +1 . Thus
6
5
ν(a0 , s I +1 ) =
s ′ ∈S Ps ′
ν(Ps ′ ) · α(e) =
e
Ps ′
ν(Ps ′ ) · α(e),
where the last equality follows from the distributive property in A. By induction the inner
sum is ν(a0 , s I′ ) = ν(a0 , ∗e) and so ν(a0 , s I +1 ) is indeed the value computed in (15.19).
Thus the General Viterbi Algorithm correctly computes the flow from a0 to any vertex in
the trellis. In particular, the flow ν(a0 , a L+M ) is correctly computed by this algorithm.
587
15.4 Two-way APP decoding
15.4
Two-way APP decoding
In this section we present a soft decision decoding algorithm for binary convolutional codes
that computes, at each instance of time, the probability that a message symbol is 0 based
on knowledge of the received vector and the channel probabilities. This information can be
used in two ways. First, knowing these probabilities, the decoder can decide whether or not
the message symbol at any time is 0 or 1. Second, the decoder can pass these probabilities on
for another decoder to use. From these probabilities, eventually hard decisions are made on
the message symbols. As we present the algorithm, called Two-Way a Posteriori Probability
(APP) Decoding, we will model it using a binary input, 8-ary output DMC subject to AWGN.
Adopting the notation of Section 15.2, a message x(i) = (x1 (i), . . . , xk (i)) with i =
0, 1, . . . , L − 1 is encoded starting in the zero state a0 using the generator matrix G of an
(n, k) binary convolutional code to produce a codeword c where c(i) = (c1 (i), . . . , cn (i))
for i = 0, 1, . . . , L + M − 1. Add M blocks of k zeros to the end of the message so that
the encoder will also terminate in the zero state a L+M at time L + M. After interleaving,
modulating, transmitting, demodulating, and deinterleaving, we have the received vector y,
where y(i) = (y1 (i), . . . , yn (i)) for i = 0, 1, . . . , L + M − 1 with y j (i) ∈ {01 , . . . , 14 }. We
assume that, in addition to the truncated code trellis and the received vector, the decoder
knows the channel statistics prob(y | c), the probability that the bit y is received given that
the bit c is transmitted.
The object of APP decoding is to compute the a posteriori probability prob(x j (i) = 0 | y).
The decoder can either pass these probabilities on to another decoder or make hard decisions
x j (i) that estimate the message symbol x j (i) according to
0 if prob(x j (i) = 0 | y) ≥ 12 ,
(15.20)
x j (i) =
1 otherwise.
The computation of prob(x j (i) = 0 | y) relies on the equation
prob(x j (i) = 0 | y) =
prob(y and x j (i) = 0)
.
prob(y)
(15.21)
Using the General Viterbi Algorithm described in Section 15.3 with the semiring A consisting of all nonnegative real numbers under ordinary addition and multiplication, the
numerator and denominator of (15.21) can be determined. This makes up the Two-Way
APP Decoding Algorithm.
As we describe APP decoding, we will reexamine Example 15.2.1 that was used to
illustrate soft Viterbi decoding in Section 15.2; this example was initially studied in Example 14.2.5 to demonstrate hard Viterbi decoding. We will need the truncated trellis used
in those examples. This trellis is found in both Figures 14.4 and 15.4; we recommend the
reader refer to these figures for helpful insight into APP decoding.
To compute the numerator and denominator of (15.21), we will relabel each edge e of
the truncated trellis using a label from A. We will still need to refer to the original label
c(i) = (c1 (i), . . . , cn (i)) on the edge; we will call the original label the “output” of the edge.
′
If the edge e starts at si and ends at si+1
, let α(e) = prob(y(i) and m(i)) be the new label on
588
Soft decision and iterative decoding
the edge. This label can be computed using the probability formula
prob(y(i) and m(i)) = prob(y(i) | m(i))prob(m(i)).
(15.22)
We illustrate this computation of the edge labels.
Example 15.4.1 Consider the code and the message of Example 15.2.1. The encoding uses
a (2, 1) convolutional code originally considered in Example 14.1.1. The relevant trellis can
be found in Figures 14.4 and 15.4. The set of states is {a, b, c, d} and the received vector is
y = 11 12 04 11 13 14 04 11 14 04 03 13 04 12 14 13 .
We assume that the channel statistics prob(y | c) are still given by the table
c\y
0
1
04
03
02
01
11
12
13
14
0.368 0.207 0.169 0.097 0.065 0.051 0.028 0.015
0.015 0.028 0.051 0.065 0.097 0.169 0.207 0.368
as in Example 15.2.1. The message consists of six unknown bits followed by two 0s. We
also assume that
0.6 if i = 0, 1, 2, 3, 4, 5,
prob(m(i) = 0) =
1 if i = 6, 7,
and
0.4 if i = 0, 1, 2, 3, 4, 5,
prob(m(i) = 1) =
0 if i = 6, 7,
where prob(m(i) = 0), respectively prob(m(i) = 1), is the probability that the ith message
bit is 0, respectively 1. The values for these probabilities when i = 6 and 7 are clear
because we know that the seventh and eighth message bits are 0. Normally we would expect
the probabilities between times i = 0 and 5 to all equal 0.5. However, APP decoding can be
applied to iterative decoding in which the probabilities prob(m(i) = 0) and prob(m(i) = 1)
come from some other decoder and hence could be almost anything. Simply to illustrate
this, we have chosen the values to be different from 0.5.
Consider the edge e from state c to state b between time i = 1 and i = 2. The input
bit for this edge is 0, the edge output (original label) is 10, and the portion of the received
vector between i = 1 and i = 2 is 04 11 . Thus by (15.22) and the probabilities in the previous
paragraph,
α(e) = prob(04 11 | 10)prob(m(1) = 0)
= prob(04 | 1)prob(11 | 0)prob(m(1) = 0)
= 0.015 × 0.065 × 0.6 = 5.850 × 10−4 .
Similarly, if e is the edge between c2 and d3 ,
α(e) = prob(13 14 | 01)prob(m(2) = 1)
= prob(13 | 0)prob(14 | 1)prob(m(2) = 1)
= 0.028 × 0.368 × 0.4 = 4.122 × 10−3 .
589
15.4 Two-way APP decoding
y = 11 12
a = 00
04 1 1
1.989·10
s
❆
s
❆
1.000
❆
b = 01
c = 10
d = 11
i= 0
−3
s
❆
04 11
1.825·10
s
❆
2.855·10−5
❆
❆
13 14
3.836 ❆
·10−6 ❆
s
❆
3.127·10−9
5.810 ❆
·10−7 ❆
03 13
−11
2.063·10
s
❆
3.102 ❆
4.620
·10−9 ❆ ·10−10
04 12
s
1.678·10−12
❆
1.740
14 13
7.560·10
s
−14
s
2.219·10−15
2.871
−11
·10−14
s
s
s ❆ ·10 s
s
✁
❆
❆
❆ ✁❅ ❆ ✁❅ ❆ ✁❅ ❆ ✁
✁
✁
✁
✁
✁
❅✁❆
❅✁❆
❅✁❆
❅✁❆
❆
❆
✁
❆s
❆s
❆s
❆s
❆s
✁ ❅
✁ ❅❆s
✁
✁ ❅
✁ ❅
1.158❅ ✁ 8.705❅ ✁ 5.665❅ ✁ 1.375❅ ✁ 1.119 ✁
6.557❅
−11
−9
−7
−6
−3
−12
·10
·10
✁ ·10
✁ ·10 ✁
✁ ·10
✁ ·10
❅
✁ ❅
✁ ❅
✁ ❅
✁
✁ ❅
❅s✁
❅s✁
❅s✁
❅s✁
❅s✁4.478·10−13
9.363·10−5
❆
s
❅
14 04
−7
1.211·10−7
1
2
3
1.248·10−8
4
6.763·10−10
5
6
7
8
Figure 15.6 Flow ν(a0 , si ) for Example 15.4.1.
If e is the edge between d6 and b7 ,
α(e) = prob(04 12 | 01)prob(m(6) = 0)
= prob(04 | 0)prob(12 | 1)prob(m(6) = 0)
= 0.368 × 0.169 × 1 = 6.219 × 10−2 .
The values of the new edge labels are given in Table 15.1. In the table, the label of the edge
′
′
is denoted α(si , si+1
).
from si to si+1
Exercise 851 Verify the edge labels in Table 15.1.
With the General Viterbi Algorithm, it is amazingly simple to compute the numerator and
the denominator of (15.21). To begin the process, first apply the General Viterbi Algorithm
to the trellis with the new edge labels. The algorithm computes the flow ν(a0 , si ) for all states
s and 0 ≤ i ≤ L + M. For the code and received vector considered in Example 15.4.1, the
flow ν(a0 , si ) is given in Figure 15.6; here the value of the flow is placed near the node si .
Next, apply the General Viterbi Algorithm backwards. That is, find the flow from a L+M to si
beginning with i = L + M and going down to i = 0; call this flow νb (a L+M , si ). Notice that
ν(si , a L+M ) = νb (a L+M , si ). For the code and received vector considered in Example 15.4.1,
the backward flow νb (a L+M , si ) is given in Figure 15.7; again the value of the flow is placed
near the node si .
Exercise 852 Verify the values of ν(a0 , si ) in Figure 15.6 using the General Viterbi Algorithm.
Exercise 853 Verify the values of νb (a L+M , si ) in Figure 15.7 using the backwards version
of the General Viterbi Algorithm.
To compute the denominator of (15.21), let M be the set of all possible message sequences
m = (m1 , . . . , mk ) where m(i) = (m1 (i), . . . , mk (i)) is arbitrary for 0 ≤ i ≤ L − 1 but is
Table 15.1 Edge labels for Example 15.4.1
α(ai , ai+1 )
α(ai , ci+1 )
α(bi , ai+1 )
α(bi , ci+1 )
α(ci , bi+1 )
α(ci , di+1 )
α(di , bi+1 )
α(di , di+1 )
i =0
i =1
i =2
i =3
i =4
i =5
i =6
i =7
1.989 · 10−3
6.557 · 10−3
1.435 · 10−2
5.820 · 10−4
2.520 · 10−4
3.047 · 10−2
4.571 · 10−2
1.680 · 10−4
1.863 · 10−3
4.122 · 10−3
6.182 · 10−3
1.242 · 10−3
1.435 · 10−2
5.820 · 10−4
8.730 · 10−4
9.568 · 10−3
5.850 · 10−4
1.428 · 10−2
2.142 · 10−2
3.900 · 10−4
3.312 · 10−3
2.208 · 10−3
3.312 · 10−3
2.208 · 10−3
8.125 · 10−2
9.000 · 10−5
1.350 · 10−4
5.417 · 10−2
3.478 · 10−3
2.318 · 10−3
3.478 · 10−3
2.318 · 10−3
4.704 · 10−4
1.714 · 10−2
2.571 · 10−2
3.136 · 10−4
1.877 · 10−2
4.200 · 10−4
2.535 · 10−3
7.618 · 10−2
5.850 · 10−4
1.428 · 10−2
7.650 · 10−4
6.219 · 10−2
591
15.4 Two-way APP decoding
y = 1112
0412
1314
1404
0411
0313
1413
0412
2.219 ·10 −15
3.953 ·10 −11
1.798 ·10 −7
7.883 ·10 −6
5.769 ·10 −13
2.589 ·10 −9
1.625 ·10 −7
4.200 ·10 −4
1.000
a = 00
1.185
·10 −10
b = 01
c = 10
1.633
·10 −13
d = 11
2.847
·10 −10
1.798
·10 −7
1.065
·10 −6
7.618
·10 −2
✁ 1.276 ✁ 1.334 ✁ 8.120 ✁ 5.828 ✁
✁ ·10 −9 ✁ ·10 −8 ✁ ·10 −5 ✁ ·10 −5 ✁
✁
✁
✁
✁
✁
✁
✁
✁
✁
✁4.738 ·10 −3
1.654
·10 −11
6.583 ·10 −12
3.884 ·10 −9 8.199 ·10 −8
i=0
1.625
·10 −7
1
2
3
4
1.513 ·10 −6
5
6
7
8
Figure 15.7 Backward flow νb (a L+M , si ) for Example 15.4.1.
a block of k zeros for L ≤ i ≤ L + M − 1. Thus each message corresponds to a single path
through the truncated trellis. Notice that the denominator satisfies
prob(y) =
=
prob(y and m)
m∈M
m∈M
L+M−1
prob(y(i) and m(i)).
i=0
The latter is precisely the flow ν(a0 , a L+M ) through the trellis computed by the General
Viterbi Algorithm using the semiring A of nonnegative real numbers under ordinary addition
and multiplication.
To compute the numerator of (15.21), let M j (i) denote the subset of M where m j (i) = 0.
Each message in M j (i) corresponds to a single path through the truncated trellis where,
in going from time i to time i + 1, the edge chosen must correspond to one obtained only
when the jth message bit at time i is 0. As an illustration, consider Figure 14.4. Since k = 1
for the code of this figure, we only need to consider M1 (i); this set of messages corresponds
to all paths through the trellis that go along a solid edge between time i and time i + 1 as
solid edges correspond to input 0. The numerator of (15.21) satisfies
prob(y and x j (i) = 0) =
=
prob(y and m)
m∈M j (i)
m∈M j (i)
L+M−1
prob(y(i) and m(i)).
i=0
Let E(i, j) be all the edges from time i to time i + 1 that correspond to one obtained when
the jth message bit at time i is 0. Then
prob(y and x j (i) = 0) =
e∈E(i, j)
ν(a0 , ∗e)α(∗e, e∗)ν(e∗, a L+M ),
592
Soft decision and iterative decoding
where α(∗e, e∗) is the edge label α(e) of e. But ν(e∗, a L+M ) = νb (a L+M , e∗) and so
prob(y and x j (i) = 0) =
e∈E(i, j)
ν(a0 , ∗e)α(∗e, e∗)νb (a L+M , e∗).
(15.23)
We now formally state the Two-Way APP Decoding Algorithm using an (n, k) binary
convolutional code with received vector y:
I.
Construct the truncated trellis for the code. Let a be the zero state.
II. Compute the edge labels of the truncated trellis using (15.22).
III. For i = 0, 1, . . . , L + M, recursively compute the flow ν(a0 , si ) for all states s using
the General Viterbi Algorithm.
IV. For i = L + M, L + M − 1, . . . , 0, recursively compute the backward flow
νb (a L+M , si ) for all states s using the backward version of the General Viterbi Algorithm.
V. For 0 ≤ i ≤ L + M − 1 and 1 ≤ j ≤ k, compute
γ j (i) = prob(y and x j (i) = 0)
using (15.23).
VI. For 0 ≤ i ≤ L + M − 1 and 1 ≤ j ≤ k, compute
prob(x j (i) = 0 | y) =
γ j (i)
.
ν(a0 , a L+M )
VII. For 0 ≤ i ≤ L + M − 1 and 1 ≤ j ≤ k, give an estimate
x j (i) of the message symbol
x j (i) according to (15.20).
Example 15.4.2 Continuing with Example 15.4.1, Steps I–IV of the Two-Way APP
Decoding Algorithm have been completed in Table 15.1 and in Figures 15.6 and 15.7.
The values for γ1 (i) when 0 ≤ i ≤ 7 computed in Step V are as follows. By (15.23),
γ1 (0) = ν(a0 , a0 )α(a0 , a1 )νb (a8 , a1 ) = 1.148 × 10−15
and
γ1 (1) = ν(a0 , a1 )α(a1 , a2 )νb (a8 , a2 ) + ν(a0 , c1 )α(c1 , b2 )νb (a8 , b2 ) = 1.583 × 10−15 .
For 2 ≤ i ≤ 6,
γ1 (i) = ν(a0 , ai )α(ai , ai+1 )νb (a8 , ai+1 ) + ν(a0 , bi )α(bi , ai+1 )νb (a8 , ai+1 )
+ ν(a0 , ci )α(ci , bi+1 )νb (a8 , bi+1 ) + ν(a0 , di )α(di , bi+1 )νb (a8 , bi+1 ),
yielding
γ1 (2) = 6.379 × 10−16 ,
γ1 (5) = 3.175 × 10−17 ,
γ1 (3) = 1.120 × 10−15 ,
γ1 (4) = 7.844 × 10−17 ,
and γ1 (6) = 2.219 × 10−15 .
Also,
γ1 (7) = ν(a0 , a7 )α(a7 , a8 )νb (a8 , a8 ) + ν(a0 , b7 )α(b7 , a8 )νb (a8 , a8 ) = 2.219 × 10−15 .
593
15.5 Message passing decoding
Since prob(x(i) = 0 | y) = γ1 (i)/ν(a0 , a8 ), Step VI yields:
prob(x(0) = 0 | y) = 0.5172,
prob(x(1) = 0 | y) = 0.7135,
prob(x(4) = 0 | y) = 0.0354,
prob(x(5) = 0 | y) = 0.0143, and
prob(x(2) = 0 | y) = 0.2875,
prob(x(6) = 0 | y) = 1.0000,
prob(x(3) = 0 | y) = 0.5049,
prob(x(7) = 0 | y) = 1.0000.
By (15.20), the APP decoder generates the message 00101100 in Step VII and therefore the
codeword 0000111000010111. This message disagrees in component x(3) with the decoded
message found using the Soft Decision Viterbi Algorithm in Example 15.2.1. Notice that
the value of this bit x(3) (and also x(0)) is highly uncertain by the above probability. Notice
also that prob(x(6) = 0 | y) = prob(x(7) = 0 | y) = 1.0000, just as expected.
Exercise 854 Repeat Examples 15.4.1 and 15.4.2 using the same values for prob(y | c),
but
0.5 if i = 0, 1, 2, 3, 4, 5,
prob(m(i) = 0) =
1 if i = 6, 7,
and
0.5 if i = 0, 1, 2, 3, 4, 5,
prob(m(i) = 1) =
0 if i = 6, 7,
in place of the values of prob(m(i) = 0) and prob(m(i) = 1) of those examples.
Exercise 855 Use the Two-Way APP Decoding Algorithm to decode the following received
vectors encoded by the code of Example 15.4.1 together with the probabilities of that
example. Compare your results with those in Exercises 817 and 845.
(a) 14 13 03 02 04 13 12 13 14 12 13 11 12 14 03 14 ,
(b) 03 14 13 02 14 11 03 14 13 02 04 14 13 02 03 12 .
There are other APP decoders for various types of codes; see [158, Chapter 7]. Also there
are a number of connections between trellises and block codes; [336] gives an excellent
survey. The February 2001 issue of IEEE Transactions on Information Theory is devoted
to relationships among codes, graphs, and algorithms. The survey article [180] in that issue
gives a nice introduction to these connections through a number of examples.
15.5
Message passing decoding
In this section we introduce the concept of message passing decoding; such decoding is
used in several contexts, including some versions of turbo decoding. Message passing is
our first example of iterative decoding. In the next section, we present two other types of
iterative decoding applied to low density parity check codes.
To describe message passing it is helpful to begin with a bipartite graph, called a Tanner
graph5 [325], constructed from an r × n parity check matrix H for a binary code C of length
5
Tanner graphs are actually more general than the definition we present. These graphs themselves were generalized
by Wiberg, Loeliger, and Kötter [350].
594
Soft decision and iterative decoding
c5
✉
c4 ✉
c6
✉
+✐
c1 ✉
c3 ✉
+✐
+✐
✉
c7
✉
c2
Figure 15.8 Tanner graph of the code in Example 15.5.1.
n. The Tanner graph has two types of vertices, called variable and check nodes. There are
n variable nodes, one corresponding to each coordinate or column of H . There are r check
nodes, one for each parity check equation or row of H . The Tanner graph has only edges
between variable nodes and check nodes; a given check node is connected to precisely those
variable nodes where there is 1 in the corresponding column of H .
Example 15.5.1 Let C be the [7, 4, 2] binary code with parity check matrix
1
H = 1
1
1
0
0
1
0
0
0
1
0
0
1
0
0
0
1
0
0 .
1
Figure 15.8 gives the Tanner graph for this code. The solid vertices are the variable nodes,
while the vertices marked ⊕ represent the check nodes.
Message passing sends information from variable to check nodes and from check to
variable nodes along edges of the Tanner graph at discrete points of time. In the situation
we examine later, the messages passed will represent pairs of probabilities. Initially every
variable node has a message assigned to it; for example, these messages can be probabilities
coming directly from the received vector. At time 1, some or all variable nodes send the
message assigned to them to all attached check nodes via the edges of the graph. At time
2, some of those check nodes that received a message process the message and send along
a message to some or all variable nodes attached to them. These two transmissions of
messages make up one iteration. This process continues through several iterations. In the
processing stage there is an important rule that must be followed. A message sent from a
node along an adjacent edge must not depend on a message previously received along that
edge.
We will illustrate message passing on a Tanner graph using an algorithm described in [91].
It has elements reminiscent of the Soft Decision Viterbi Algorithm presented in Section 15.2.
First, we assume that T is the Tanner graph of a length n binary code C obtained from a
595
15.5 Message passing decoding
parity check matrix H for C. We further assume that T has no cycles.6 As in Section 15.2,
for concreteness we assume that the demodulation scheme used for the received vector is
8-level quantization. Therefore suppose that c = c1 c2 · · · cn is the transmitted codeword,
and y = y1 y2 · · · yn is received with yi ∈ {01 , . . . , 14 }. As with the Soft Decision Viterbi
Algorithm, our goal is to perform maximum likelihood decoding and find the codeword c
that maximizes prob(y | c). We assume the channel is memoryless and hence
prob(y | c) =
n
i=1
prob(yi | ci ).
As in Section 15.2, take the logarithm turning the product into a sum and then rescale and
shift the probabilities so that the value to maximize is
n
µ(yi , ci ),
i=1
where the maximum is over all codewords c = c1 c2 · · · cn , analogous to (15.15).
The Tanner graph T has no cycles and hence is a tree or forest. For simplicity, assume T
is a tree; if T is a forest, carry out the algorithm on each tree in T separately. All the leaves
are variable nodes provided no row of H has weight 1, which we also assume. At the end
of the algorithm, messages will have been passed along all edges once in each direction.
Each message is a pair of nonnegative integers. The first element in the pair is associated
with the bit 0, and the second element in the pair is associated with the bit 1. Together the
pair will represent relative probabilities of the occurrence of bit 0 or bit 1. The Message
Passing Decoding Algorithm is as follows:
I. To initialize, assign the pair (µ(yi , 0), µ(yi , 1)) to the ith variable node. No assignment
is made to the check nodes. Also for every leaf, the same pair is to be the message,
leaving that leaf node directed toward the attached check node.
II. To find a message along an edge e leaving a check node h, assume all messages directed
toward h, except possibly along e, have been determined. Suppose the messages entering
h except along e are
(ai1 , bi1 ), (ai2 , bi2 ), . . . , (ait , bit ).
Consider all sums
t
xi j ,
(15.24)
j=1
where xi j is one of ai j or bi j . The message along e leaving h is (m 0 , m 1 ), where m 0 is
the maximum of all such sums with an even number of bi j s, and m 1 is the maximum
of all such sums with an odd number of bi j s.
III. To find a message along an edge e leaving a variable node v, assume all messages
directed toward v, except possibly along e, have been determined. The message leaving
6
Unfortunately, if the Tanner graph has no cycles, the code has low minimum weight. For example, such codes
of rate 0.5 or greater have minimum weight 2 or less; see [79].
596
Soft decision and iterative decoding
v along e toward a check node is assigned the pair at v (from Step I) added to the sum
of all the incoming messages to v except along e.
IV. Once all messages in both directions along every edge have been determined by repetition of II and III, a new pair ( p0 , p1 ) is computed for every variable node v. This pair
is the sum of the two messages along any edge attached to v.
V. Decode the received vector as follows. For the ith variable node, let ci = 0 if p0 > p1
and ci = 1 if p1 > p0 .
The rationale for the message passed in II is that the summation in (15.24) is associated with
bit 0, respectively bit 1, if there are an even, respectively odd, number of bi j s involved, as
these sums correspond to adding an even, respectively odd, number of “probabilities” associated to 1s. We do not prove the validity of the algorithm; more discussion of the algorithm
is found in [91] and the references therein. Note again how closely related this algorithm is
to the Soft Decision Viterbi Algorithm.
Example 15.5.2 Suppose that y = 12 03 01 13 02 11 04 is the received vector when a codeword
is transmitted using the [7, 4, 2] code C of Example 15.5.1. Even though this code has
minimum distance 2, implying the code cannot correct any errors using nearest neighbor
decoding, we can still decode using soft decision decoding to find the maximum likely
codeword. We will use the Tanner graph for C in Figure 15.8 determined by the parity
check matrix H given in Example 15.5.1. We will also use the values of µ(y, c) given in
Example 15.2.1. The messages obtained using the Message Passing Decoding Algorithm
are given in Figure 15.9. Step I of the algorithm is carried out at time 1 using the table
of values for µ(y, c) in Example 15.2.1 applied to the received vector. At time 2, we pass
messages from the three check nodes toward y1 using II. For example, the message (5, 8)
directed toward y1 is determined from the messages (0, 5) and (3, 0) entering the check
node from y4 and y5 as follows:
(5, 8) = (max{0 + 3, 5 + 0}, max{0 + 0, 5 + 3}).
Similarly, the message (6, 5) directed toward y1 is
(6, 5) = (max{1 + 5, 0 + 0}, max{0 + 5, 1 + 0}).
After time 2, every edge has one message along it in some direction. At time 3, messages
are sent along the three edges leaving y1 using III. For example, the message (14, 17) is the
sum of the node pair (0, 3) plus the sum (8, 9) + (6, 5). At time 4, messages are sent to all
leaf nodes using II. For instance, the message (17, 20) sent to y4 uses the edges (3, 0) and
(14, 17) yielding
(17, 20) = (max{3 + 14, 0 + 17}, max{3 + 17, 0 + 14}).
After time 4, all edges have messages in each direction. By adding the edges attached to
each variable node, Step IV gives:
y1
( p0 , p1 )
y2
y3
y4
y5
y6
y7
(19, 25) (25, 21) (21, 25) (17, 25) (25, 19) (19, 25) (25, 16)
Note that the pair for y1 is obtained using any of the three edges attached to y1 ,
597
15.5 Message passing decoding
y 5 (3 , 0)
y 5 (3 , 0)
(3 , 0)
(0 , 5)
y4
(0 , 5)
(3 , 0)
y6
(0 , 1)
(0 , 5)
(0 , 5)
y4
(0 , 1)
(0 , 1)
(5 , 8)
(8 , 9)
y 1 (0 , 3) ✛
y 1 (0 , 3)
(6 , 5)
(8 , 0)
(8 , 0)
(1 , 0)
y3
(8 , 0)
(1 , 0)
(1 , 0)
y3
Time 1
(2 2, 19)
y6
(0 , 1)
(0 , 5)
y4
(0 , 1)
(5 , 8)
✛
(1 7, 20)
(1 1, 16)
(1 , 0)
y3
y3
(8 , 0)
y7
y 2 (5 , 0)
(0 , 1)
(5 , 8)
(1 9, 24)
(1 1, 16)
(1 3, 20)
(1 , 0)
(5 , 0)
y6
(0 , 1)
(8 , 9)
y 1 (0 , 3) ✛
(6 , 5)
(8 , 0)
(1 , 0)
(3 , 0)
(0 , 5)
(1 4, 17)
(8 , 9)
y 1 (0 , 3) ✛
(1 3, 20)
Time 2
y 2 (5 , 0)
y 5 (3 , 0)
(3 , 0)
(1 4, 17)
y7
(5 , 0)
y 2 (5 , 0)
y 5 (3 , 0)
(0 , 5)
(8 , 0)
(1 , 0)
y7
(5 , 0)
(0 , 5)
y4
y6
(0 , 1)
Time 3
(2 0, 25)
✛
(6 , 5)
(8 , 0)
(1 7, 16)
(8 , 0)
(1 , 0)
y7
(2 0, 21)
(5 , 0)
Time 4
y 2 (5 , 0)
Figure 15.9 Message passing on the Tanner graph of Figure 15.8.
namely, (19, 25) = (14, 17) + (5, 8) = (11, 16) + (8, 9) = (13, 20) + (6, 5). We decode y
as c = 1011010 from Step V. Notice that for our received vector y and this codeword c,
7
i=1 µ(yi , ci ) = 3 + 5 + 0 + 5 + 3 + 1 + 8 = 25, which is the largest component of each
of the seven ( p0 , p1 )s.
Exercise 856 Verify the computations in Example 15.5.2.
598
Soft decision and iterative decoding
Exercise 857 Let C be the code in Example 15.5.1 with Tanner graph in Figure 15.8. Use
the Message Passing Decoding Algorithm to decode the following received vectors; use the
values of µ(y, c) given in Example 15.2.1.
(a) y = 02 13 01 04 12 01 13 ,
(b) y = 13 03 02 11 02 04 12 .
Exercise 858
matrix
1 1
1 0
H =
0 0
0 1
Construct the Tanner graph for the [10, 6, 2] binary code with parity check
1
0
0
0
1
0
0
0
0
1
1
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
.
0
1
Then use the Message Passing Decoding Algorithm to decode
y = 11 02 13 01 11 12 13 14 03 11 .
Use the values of µ(y, c) given in Example 15.2.1.
15.6
Low density parity check codes
In the previous section we introduced our first example of iterative decoding through message passing. In the mid 1990s, iterative decoding became a significant technique for the
decoding of turbo codes. Iterative decoding was introduced by Elias in 1954 [77]. Iterative decoding reappeared with the work of R. G. Gallager on low density parity check
codes in 1963 but since then seems to have been largely neglected, with only a few exceptions, until researchers realized that the decoding of turbo codes was closely related to
Gallager’s techniques. In this section we will introduce low density parity check codes and
examine Gallager’s two versions of iterative decoding. Our presentation relies heavily on
[94, 95, 211]. For the interested reader, the use of message passing in low density parity
check codes is described in [295].
In short, a binary low density parity check code is a code having a parity check matrix
with relatively few 1s which are dispersed with a certain regularity in the matrix. An
[n, k] binary linear code C is a low density parity check (LDPC) code provided it has
an m × n parity check matrix H where every column has fixed weight c and every row
of H has fixed weight r .7 This type of code will be denoted an (n, c, r ) LDPC code.
Note that such a matrix may not have independent rows and so is technically not a parity
check matrix in the sense we have used in this text. However, H can be row reduced
and the zero rows removed to form a parity check matrix for C with independent rows.
Thus the dimension of C is at least n − m. Counting the number of 1s in H by both rows
and columns gives the relationship nc = mr among the parameters. Generally, we want
r and c to be relatively small compared to n and thus the density of 1s in H is low.
7
A natural generalization for LDPC codes is to allow the row weights and column weights of H to vary, in some
controlled manner. The resulting codes are sometimes termed “irregular” LDPC codes.
599
15.6 Low density parity check codes
1
1
H =
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Figure 15.10 Parity check matrix for a (16, 3, 4) LDPC code.
1 1 1
1
1
1
1
1
1
Figure 15.10 gives the parity check matrix of a (16, 3, 4) code, which is a [16, 6, 6] binary
code.
Exercise 859 Show that the binary code C with parity check matrix given in Figure 15.10
has dimension 6 and minimum weight 6. Hint: Row reduce H to obtain the dimension. For
the minimum weight, show that C is even and that every codeword has an even number
of 1s in each of the four blocks of four coordinates, as presented in the figure. Then use
Corollary 1.4.14 to show that C has no weight 4 codewords.
Gallager developed two iterative decoding algorithms designed to decode LDPC codes
of long length, several thousand bits for example. The first iterative decoding algorithm is a
simple hard decision algorithm. Assume that the codeword c is transmitted using an (n, c, r )
binary LDPC code C, and the vector y is received. In the computation of the syndrome H yT ,
each received bit yi affects at most c components of that syndrome, as the ith bit is in c
parity checks. If among all the bits S involved in these c parity checks only the ith is in
error, then these c components of H yT will equal 1 indicating the parity check equations
are not satisfied. Even if there are some other errors among these bits S, one expects that
several of the c components of H yT will equal 1. This is the basis of the Gallager Hard
Decision Decoding Algorithm.
I. Compute H yT and determine the unsatisfied parity checks, that is, the parity checks
where the components of H yT equal 1.
II. For each of the n bits, compute the number of unsatisfied parity checks involving that
bit.
III. Change the bits of y that are involved in the largest number of unsatisfied parity checks;
call the resulting vector y again.
IV. Iteratively repeat I, II, and III until either H yT = 0, in which case the received vector
is decoded as this latest y, or until a certain number of iterations is reached, in which
case the received vector is not decoded.
600
Soft decision and iterative decoding
Example 15.6.1 Let C be the (16, 3, 4) LDPC code with parity check matrix H of
Figure 15.10. Suppose that
y = 0100101000100000
is received. Then H yT = 101011001001T . There are six unsatisfied parity checks, corresponding to rows 1, 3, 5, 6, 9, and 12 of H . Bits 1 and 10 are each involved in three of these;
all other bits are involved in two or fewer. Thus change bits 1 and 10 of y to obtain a new
y = 1100101001100000.
Iterating, we see that H yT = 0 and so this latest y is declared the transmitted codeword;
since it has distance 2 from the received vector, we have accomplished nearest neighbor
decoding.
Now suppose that
y = 1100101000000000
is received. Then H yT = 000001101001T . Parity checks 6, 7, 9, and 12 are unsatisfied; bits
6, 10, 11, and 15 are in two unsatisfied parity checks with the other bits in fewer. Changing
these bits yields the new
y = 1100111001100010.
Iterating, H yT = 010101101001T . Parity checks 2, 4, 6, 7, 9, and 12 are unsatisfied; bits 6
and 15 are in three unsatisfied parity checks with the other bits in two or fewer. Changing
bits 6 and 15 produces a new
y = 1100101001100000.
Iterating again yields H yT = 0 and so this latest y is declared the transmitted codeword,
which is distance 2 from the received vector.
Finally, suppose that
y = 0100101001001000
is received. Then H yT = 101100100100T . There are five unsatisfied parity checks and bits
2, 3, 7, 11, 12, 13, and 15 are each involved in two of these; all other bits are involved in
fewer than two. Changing these bits gives a new
y = 0010100001110010.
(15.25)
Iterating, H yT is the all-one column vector, indicating that all parity checks are unsatisfied
and all bits are in three unsatisfied checks. So all bits of y are changed and in the next
iteration H yT is again the all-one column vector. Thus all bits are changed back giving
(15.25) again. The iterations clearly are caught in a cycle and will never reach a codeword.
So the original received vector is undecodable.
Exercise 860 Verify the results illustrated in Example 15.6.1.
601
Y
L
15.6 Low density parity check codes
F
Exercise 861 Let C be the (16, 3, 4) LDPC code with parity check matrix H in Figure 15.10.
Suppose the following vectors y are received. Decode them using the Gallager Hard Decision
Decoding Algorithm when possible.
(a) y = 1000010110100100,
(b) y = 1000010110100001,
(c) y = 0110110110100101.
T
m
a
e
Exercise 862 The Gallager Hard Decision Decoding Algorithm is performed in “parallel;”
all bits of y in the largest number of unsatisfied checks are changed at each iteration. There
is a “sequential” version of the algorithm. In this version only one bit is changed at a time.
Thus III is replaced by:
III.′ Among all bits of y that are involved in the largest number of unsatisfied parity checks,
change only the bit yi with smallest subscript i; call the resulting vector y again.
The sequential version is otherwise unchanged. Apply the sequential version of the algorithm
to the same received vectors as in Example 15.6.1 and compare your answers with those in
the example:
(a) y = 0100101000100000,
(b) y = 1100101000000000,
(c) y = 0100101001001000.
Gallager’s Soft Decision Iterative Decoding Algorithm can be presented in a number
of ways. We describe Mackay’s version [211], which he calls the Sum-Product Decoding
Algorithm. It applies more generally than we describe it here. We only present the algorithm
and refer the reader to Mackay’s paper for further discussion. Mackay has a visual animation
of the algorithm in action at:
http://wol.ra.phy.cam.ac.uk/mackay/codes/gifs/
Let C be an (n, c, r ) binary low density parity check code with m × n parity check matrix
H . A Tanner graph T for C can be formed. Number the variable nodes 1, 2, . . . , n and the
check nodes 1, 2, . . . , m. Every variable node is connected to c check nodes and every check
node is connected to r variable nodes. Let V ( j) denote the r variable nodes connected to
the jth check node; the set V ( j) \ s is V ( j) excluding the variable node s. Let C(k) denote
the c check nodes connected to the kth variable node; the set C(k) \ t is C(k) excluding the
check node t.
Suppose the codeword c is transmitted and y = c + e is received, where e is the unknown
error vector. Given the syndrome H yT = H eT = zT , the object of the decoder is to compute
prob(ek = 1 | z)
for 1 ≤ k ≤ n.
(15.26)
From there, the most likely error vector and hence most likely codeword can be found.
The algorithm computes, in an iterative manner, two probabilities associated to each edge
of the Tanner graph and each e ∈ {0, 1}. The first probability q ejk for j ∈ C(k) and e ∈
{0, 1} is the probability that ek = e given information obtained from checks C(k) \ j. The
second probability r ejk for k ∈ V ( j) and e ∈ {0, 1} is the probability that the jth check is
satisfied given ek = e and the other bits ei for i ∈ V ( j) \ k have probability distribution
given by {q 0ji , q 1ji }. There are initial probabilities pke = prob(ek = e) for 1 ≤ k ≤ n and
602
Soft decision and iterative decoding
e ∈ {0, 1} used in the algorithm that come directly from the channel statistics. For example,
if communication is over a binary symmetric channel with crossover probability ρ, then
pk0 = 1 − ρ and pk1 = ρ. The probabilities prob(z j | ek = e, {ei | i ∈ V ( j) \ k}) for k ∈
V ( j) and e ∈ {0, 1} are also required by the algorithm. These probabilities can be computed
efficiently with an algorithm similar to the Two-Way APP Algorithm of Section 15.4.
The Sum-Product Decoding Algorithm for binary LDPC codes is the following:
I. Initially, set q ejk = pke for j ∈ C(k) and e ∈ {0, 1}.
II. Update the values of r ejk for k ∈ V ( j) and e ∈ {0, 1} according to the equation
e
r ejk =
prob(z j | ek = e, {ei | i ∈ V ( j) \ k})
q jii .
ei ∈{0,1}, i∈V ( j)\k
i∈V ( j)\k
III. Update the values of q ejk for j ∈ C(k) and e ∈ {0, 1} according to the equation
q ejk = α jk pke
rike ,
i∈C(k)\ j
where α jk is chosen so that q 0jk + q 1jk = 1.
IV. For j ∈ C(k) and e ∈ {0, 1} compute
r ejk .
qke = pke
j∈C(k)
V. For 1 ≤ k ≤ n, set
ek = 0 if qk0 > qk1 and
ek = 1 if qk1 > qk0 . Let
e =
e1 · · ·
en . If H
eT =
T
z , decode by setting e =
e. Otherwise repeat II, III, and IV up to some maximum
number of iterations. Declare a decoding failure if H
eT never equals zT .
If the Tanner graph T is without cycles and the algorithm successfully halts, then the
probabilities in (15.26) are exactly αk qk1 , where αk is chosen so that αk qk0 + αk qk1 = 1. If
the graph has cycles, these probabilities are approximations to (15.26). In the end, we
do not care exactly what these probabilities are; we only care about obtaining a solution to H eT = zT . Thus the algorithm can be used successfully even when there are
long cycles present. The algorithm is successful for codes of length a few thousand, say
n = 10 000, particularly with c small, say c = 3. Analysis of the algorithm can be found in
[211].
15.7
Turbo codes
Turbo codes were introduced in 1993 by Berrou, Glavieux, and Thitimajshima [20]. A turbo
code consists of a parallel concatenated encoder to which an iterative decoding scheme is
applied. In this section we outline the ideas behind encoding while in the next section we
will discuss turbo decoding; the reader interested in more detail can consult [118, 342, 352].
The importance of turbo coding has probably yet to be fully comprehended. On October
6, 1997, the Cassini spacecraft was launched bound for a rendezvous with Saturn in June,
2004; an experimental turbo code package was included as part of the mission. Turbo codes
will have more earthly applications as well. According to Heegard and Wicker [118, p. 7],
“It is simply not possible to overestimate the impact that the increase in range and/or data
603
15.7 Turbo codes
10−1
10−2
turbo code
rate = 1/6
m = 8920
10 iterations
Uncoded BPSK
turbo code
rate = 1/6
m = 1784
10 iterations
10−3
Voyager
Galileo
Pb
(BER)
turbo code
rate = 1/3
m = 1784
10 iterations
10−4
Cassini/Pathfinder
10−5
turbo code
rate = 1/3
m = 8920
10 iterations
10−6
−0.5
0
0.5
1.0
1.5
Eb /N0 (dB)
2
2.5
3.0
Figure 15.11 Coding gain in satellite communications.
rate resulting from turbo error control will have on the telecommunications industry. The
effect will be particularly strong in wireless personal communications systems . . . .”
The excitement about turbo coding is illustrated by the results shown in Figure 15.11.8 In
Section 15.1, we indicated that one objective of coding is to communicate at signal-to-noise
ratios close to the Shannon limit of −1.6 dB. Figure 15.11 shows the comparison between
the signal-to-noise ratio E b /N0 and the bit error rate Pb of codes used in some satellite
communication packages. This figure also compares four turbo codes to these satellite
8
The data for this figure was provided by Bob McEliece of Caltech and Dariush Divsalar and Fabrizio Pollara of
JPL.
604
Soft decision and iterative decoding
packages. (The value of m is the message length; the number of iterations for the four
turbo codes will be described in Section 15.8.) Figure 15.3 shows the Shannon limit and
also the lowest signal-to-noise ratios that can be obtained for codes of various information
rates. Comparing the two figures indicates the superiority of turbo codes; they achieve
communication at low signal-to-noise ratios very close to the Shannon limit!
A parallel concatenated encoder consists of two or more component encoders which
are usually either binary convolutional codes or block codes with a trellis structure that
leads to efficient soft decision decoding.9 The simplest situation, which we will concentrate
on, involves two component codes, C 1 and C 2 , each with systematic encoders. (Systematic
convolutional encoders are described shortly.) In fact, the encoders for the two codes can
be identical. To be even more concrete, assume that a component code C i is either a (2, 1)
binary convolutional code with encoder G i = [1 gi (D)], where gi (D) is a rational function
or an [n, n/2] binary block code with generator matrix (encoder) G i = [In/2 Ai ]. (The
encoders are not only in systematic form but in standard form.) If the component codes
of the parallel concatenated code are convolutional, the resulting code is called a parallel
concatenated convolutional code or PCCC. Notice that in either case, the component code
is a rate 1/2 code as every message bit generates two codeword bits. We will assume the
usual encoding where the encoding of the message x is xG i . An encoder for the parallel
concatenated code with component encoders G 1 and G 2 is formed as follows. The message
x is encoded with the first code to produce (x, c1 ), where c1 = x(D)g1 (D) if the code is
convolutional or c1 = xA1 if the code is a block code. Next the message x is passed to
a permuter, also called an interleaver.10 The permuter applies a fixed permutation to the
coordinates of x producing the permuted message
x. The permuted message is encoded with
x, c2 ), where c2 =
x(D)g2 (D) or c2 =
xA2 . The codeword passed on to the
G 2 to produce (
channel is an interleaved version of the original message x and the two redundancy strings
c1 and c2 that are generated by the codes C 1 and C 2 . This process is pictured in Figure 15.12
and illustrated in Example 15.7.1. In the figure, SC1 and SC2 indicate that the encoders are
in standard form. Notice that the resulting parallel concatenated code has rate 1/3.
Example 15.7.1
matrix
1
0
G1 = G2 =
0
0
Let C 1 and C 2 each be the [8, 4, 4] extended Hamming code with generator
0
1
0
0
0
0
1
0
0
0
0
1
0
1
1
1
1
0
1
1
1
1
0
1
1
1
.
1
0
Assume the permuter is given by the permutation (1, 3)(2, 4). Suppose that x = 1011 is
the message to be encoded. Then xG 1 = 10110100 yielding c1 = 0100. The permuted
message is
x = 1110, which is encoded as
xG 2 = 11100001 to produce c2 = 0001. Then
9
10
We have not discussed how to form a trellis from a block code. This can be done in an efficient way that allows
the use of either hard or soft decision Viterbi decoding presented in Sections 14.2 and 15.2. See [336].
The term “interleaver” as used in parallel concatenated codes is somewhat different from the meaning of the
term “interleaving” as used in Section 5.5. The term “interleaver” in connection with turbo codes is so prominent
that we include it here, but will use the term “permuter” for the remainder of this chapter.
605
15.7 Turbo codes
✲x
x
✲
SC1
encoder
✲ c1
SC2
Encoder
✲ c2
❄
Permuter
(Interleaver)
x
✲
Figure 15.12 Parallel concatenated code with two standard form encoders.
(x, c1 , c2 ) = (1011, 0100, 0001) is transmitted in interleaved form (the first bits of x, c1 , c2
followed by the second bits, third bits, and fourth bits) as 100 010 100 101.
Exercise 863 Find the interleaved form of all 16 codewords of the parallel concatenated
code described in Example 15.7.1. What is the minimum distance of the resulting [12, 4]
binary code?
In Chapter 14 the generator matrices for the convolutional codes that we presented were
generally polynomial. However, the component convolutional codes in turbo codes use
encoders that are most often nonpolynomial, because the encoders will be systematic, as
the next example illustrates.
Example 15.7.2 We illustrate nonpolynomial encoders with the code C 1 from Example 14.1.1. If we take the encoder G 1 and divide by 1 + D + D 2 , we obtain the generator
matrix
G ′′1 = 1
1 + D2
.
1 + D + D2
A physical encoder for G ′′1 is given in Figure 15.13. This is a feedback circuit. The state
before clock cycle i is the two right-most delay elements which we denote by s(i) and
s(i − 1). Notice these values are not x(i − 1) and x(i − 2) as they were in the feedforward
circuit for the encoder G 1 given in Figure 14.1. The reader should compare Figures 14.1
and 15.13.
To understand why this circuit is the physical encoder for G ′′1 , we analyze the output c(i)
at clock cycle i. Suppose x is the input message, and at time i − 1 the registers contain
x(i), s(i), and s(i − 1). As the next input enters from the left at time i, the output c1 (i) is
clearly x(i), while the output c2 (i) is x(i) + (s(i) + s(i − 1)) + s(i − 1) = x(i) + s(i). The
contents of the registers are now x(i + 1), s(i + 1) = x(i) + s(i) + s(i − 1), and s(i). Thus
606
Soft decision and iterative decoding
✲ c1
♥
✛
x
✲
x(i)
❄
✲ ♥
✲
s(i)
✻
✲
s(i − 1)
Figure 15.13 Physical encoder for G ′′1 = [1(1 + D 2 )/(1 + D + D 2 )].
❄
✲ ♥
✲ c2
we can solve for c2 (i) using s(i + 1) = x(i) + s(i) + s(i − 1) as follows:
c2 (i) = x(i) + s(i)
= x(i) + x(i − 1) + s(i − 1) + s(i − 2)
= x(i) + x(i − 1) + x(i − 2) + s(i − 2) + s(i − 3) + s(i − 2)
= x(i) + x(i − 1) + x(i − 2) + s(i − 3)
= x(i) + x(i − 1) + x(i − 2) + x(i − 4) + s(i − 4) + s(i − 5).
Continuing, we see that
c2 (i) = x(i) + x(i − 1) + x(i − 2) + x(i − 4) + x(i − 5)
(15.27)
+ x(i − 7) + x(i − 8) + x(i − 10) + x(i − 11) + · · · .
By Exercise 864, we see that the Laurent series expansion for (1 + D 2 )/(1 + D + D 2 ) is
1 + D + D 2 + D 4 + D 5 + D 7 + D 8 + D 10 + D 11 + D 13 + D 14 + · · · .
(15.28)
Thus c2 = x(1 + D 2 )/(1 + D + D 2 ) = x(1 + D + D 2 + D 4 + D 5 + D 7 + D 8 + D 10 +
D 11 + · · · ), agreeing with (15.27). The physical encoder G ′′1 for C 1 has infinite memory by (15.27). (Recall that C 1 has memory 2 since G 1 is a canonical encoder for C 1 by
Example 14.3.2.)
Exercise 864 Prove that the Laurent series for (1 + D 2 )/(1 + D + D 2 ) is given by (15.28).
Hint: Multiply the series by 1 + D + D 2 .
Exercise 865 Consider the physical encoder G ′′1 in Figure 15.13.
(a) Suppose that x = 1000 · · · is input to the encoder. So x(0) = 1 and x(i) = 0 for i ≥ 1.
Compute c2 giving the contents of the shift-registers for several values of i until the
pattern becomes clear. (The equations in Example 15.7.2 may prove helpful.) Check
your answer by comparing it to the Laurent series for (1 + D 2 )/(1 + D + D 2 ) given
by (15.28).
(b) Suppose that x = 1010000 · · · is input to the encoder. Compute c2 giving the contents
of the shift-registers for several values of i until the pattern becomes clear. Check
your answer by comparing it to the product of 1 + D 2 and the Laurent series for (1 +
D 2 )/(1 + D + D 2 ) given by (15.28).
607
15.8 Turbo decoding
As with block codes, a systematic generator matrix (or systematic encoder) for an (n, k)
convolutional code is any generator matrix with k columns which are, in some order, the k
columns of the identity matrix Ik . For example, there are two systematic encoders for the
code C 1 of Example 15.7.2: the matrix G ′′1 of that example and
G ′′′
1 =
1 + D + D2
1 + D2
1 .
If the systematic encoder has form [Ik A], the encoder is in standard form. The encoder G ′′1
of Example 15.7.2 is in standard form. A recursive systematic convolutional (RSC) encoder
for a convolutional code is a systematic encoder with some entry that is nonpolynomial
of infinite weight. So both G ′′1 and G ′′′
1 are RSC encoders for C 1 . A physical encoder
representing an RSC encoder will involve feedback circuits as in Figure 15.13.
In most cases the component codes of a PCCC are encoded by RSC encoders in order
to attempt to have high minimum weights. Suppose that G = [1 f (D)/g(D)] is an RSC
encoder for a (2, 1) convolutional code where f (D) and g(D) are relatively prime polynomials and g(D) is not a power of D. By Lemma 14.4.2, wt( f (D)/g(D)) = ∞. Suppose
that G is the encoder for both component codes of a PCCC as pictured in Figure 15.12.
If x(D) is an input polynomial, then x(D)G = (x(D), x(D) f (D)/g(D)), which can have
finite weight only if x(D) is a multiple of g(D). With convolutional codes, one wants all
codewords to have high weight to make the free distance high. In particular, if a polynomial
x(D) is input to our PCCC, one hopes that at least one of the two outputs c1 or c2 is of high
weight. Thus if c1 has finite weight, which means x(D) is a multiple of g(D) (a somewhat
rare occurrence), one hopes that the redundancy c2 is of infinite weight. That is the purpose
of the permuter; it permutes the input components of x(D) so that if x(D) is a multiple
of g(D), hopefully
x(D) is not a multiple of g(D). Therefore if the permuter is designed
properly, a very high percentage of outputs (c1 , c2 ) will have high weight even if the component codes have small free distances individually. With only a relatively few codewords
generated by polynomial input having small weight, the overall PCCC generally performs
as if it has high free distance. The design and analysis of permuters can be found in [118,
Chapter 3] and [342, Chapter 7].
Exercise 866 Suppose that G ′′1 of Example 15.7.2 is used for both encoders in Figure 15.12.
(a) Let x = x(D) = 1 + D 3 be the input to the first encoder. By multiplying x(D) times
(1 + D 2 )/(1 + D + D 2 ), compute c1 , giving the resulting polynomial. What is wt(c1 )?
(b) Suppose the interleaver permutes the first two components of x to produce
x = D + D3.
By multiplying
x times the Laurent series (15.28) compute c2 , giving the resulting
Laurent series. What is wt(c2 )?
15.8
Turbo decoding
In this section we describe an iterative decoding algorithm that can be used to decode turbo codes. We rely heavily on the description given in [352]. Assume our
608
Soft decision and iterative decoding
turbo code involves two parallel concatenated codes C 1 and C 2 as presented in the
previous section and summarized in Figure 15.12. The turbo decoding algorithm actually uses two decoders, one for each of the component codes. Suppose that the message x = (x(1), x(2), . . . , x(m)) is transmitted interleaved with the two parity vectors
c1 = (c1 (1), c1 (2), . . . , c1 (m)) and c2 = (c2 (1), c2 (2), . . . , c2 (m)). The received vectors y =
(y(1), y(2), . . . , y(m)), y1 = (y1 (1), y1 (2), . . . , y1 (m)), and y2 = (y2 (1), y2 (2), . . . , y2 (m)),
obtained after deinterleaving, are noisy versions of x, c1 , and c2 . Their components are
received from the demodulator; if, for example, 8-level quantization was used, the components of the three received vectors would be in {01 , . . . , 14 }. Ultimately, the job of the
decoder is to find the a posteriori probability
prob(x(i) = x | y, y1 , y2 )
for x ∈ {0, 1} and 1 ≤ i ≤ m.
(15.29)
Obtaining the exact value of these leads to a rather complex decoder. Berrou, Glavieux,
and Thitimajshima [20] discovered a much less complex decoder that operates iteratively to
obtain an estimate of these probabilities, thereby allowing reliable decoding. The iterative
decoding begins with the code C 1 and uses a priori probabilities about x, the channel
statistics, and the received vectors y and y1 . This first decoder passes on soft information
to the decoder for C 2 that is to be used as a priori probabilities for this second decoder.
This second decoder uses these a priori probabilities about x, the channel statistics, and the
received vectors y and y2 to compute soft information to be passed back to the first decoder.
Passing soft information from the first decoder to the second and back to the first comprises
a single iteration of the turbo decoder; the decoder runs for several iterations. The term
“turbo” refers to the process of feeding information from one part of the decoder to the
other and back again in order to gain improvement in decoding, much like “turbo-charging”
an internal combustion engine.
We begin by examining the first half of the first iteration of the decoding process. This
step uses the first code C 1 and the received vectors y and y1 , with the goal of computing
prob(x(i) = x | y, y1 )
for x ∈ {0, 1} and 1 ≤ i ≤ m.
(15.30)
From these probabilities we can obtain the most likely message values using the partial
information obtained from y and y1 but ignoring the information from y2 . But
prob(x(i) = x, y, y1 )
prob(y, y1 )
= α prob(x(i) = x, y, y1 ),
prob(x(i) = x | y, y1 ) =
(15.31)
where α = 1/prob(y, y1 ). The exact value of α is unimportant because all we care about is
which is greater: prob(x(i) = 0 | y, y1 ) or prob(x(i) = 1 | y, y1 ). This α notation, which is
described more thoroughly in [252], allows us to remove conditional probabilities that do
not affect the relative relationships of the probabilities we are computing; we use it freely
throughout this section.
In order to determine the probabilities in (15.31), for x ∈ {0, 1} and 1 ≤ i ≤ m, let
πi(0) (x) = prob(x(i) = x) be the a priori probability that the ith message bit is x. If, for
example, 0 and 1 are equally likely to be transmitted, then πi(0) (x) = 0.5. With convolutional
codes, we saw that certain bits at the end of a message are always 0 in order to send the
609
15.8 Turbo decoding
encoder to the zero state; then πi(0) (0) = 1 and πi(0) (1) = 0 for those bits. We assume the
channel statistics prob(y | c) (see for instance Example 15.2.1) are known. Using these
channel statistics, let λi (x) = prob(y(i) | x(i) = x). We have
prob(x, y, y1 )
prob(x(i) = x, y, y1 ) =
x:x(i)=x
=
x:x(i)=x
prob(y, y1 | x)prob(x)
=
x:x(i)=x
prob(y1 | x)prob(y | x)prob(x).
(15.32)
But,
prob(y | x)prob(x) =
m
j=1
prob(y( j) | x( j))prob(x( j)) =
m
λ j (x( j))π (0)
j (x( j))
j=1
under the assumption that the channel is memoryless. Also prob(y1 | x) = prob(y1 | c1 )
because c1 is determined directly from x by the encoding using C 1 . Combining these with
(15.30), (15.31), and (15.32), we have
prob(x(i) = x | y, y1 ) = α
= α λi (x)πi(0) (x)
x:x(i)=x
x:x(i)=x
prob(y1 | c1 )
prob(y1 | c1 )
m
λ j (x( j))π (0)
j (x( j))
j=1
m
λ j (x( j))π (0)
j (x( j)).
(15.33)
j=1
j=i
Excluding α, (15.33) is a product of three terms. The first, λi (x), is the systematic term
containing information about x(i) derived from the channel statistics and the received bit
y(i). The second, πi(0) (x), is the a priori term determined only from the ith message bit.
The third is called the extrinsic term, which contains information about x(i) derived from
the received parity y1 ; we denote this term by
πi(1) (x) =
x:x(i)=x
prob(y1 | c1 )
m
λ j (x( j))π (0)
j (x( j)).
j=1
j=i
Notice that the extrinsic information does not include the specific terms λi (x(i))πi(0) (x(i)), a
phenomenon suggestive of the requirement placed on message passing that messages passed
along an edge of a Tanner graph cannot involve information previously received along that
edge. Additionally, the sum-product form of this extrinsic information is reminiscent of
the sum-product forms that arise in both the Two-Way APP Decoding Algorithm and the
Sum-Product Decoding Algorithm. This extrinsic information is passed on to the second
decoder as the a priori probability used by that decoder.
The second decoder uses the a priori probabilities, channel statistics, and received vectors
y and y2 to perform the second half of the first iteration. In a manner analogous to the first
610
Soft decision and iterative decoding
half,
prob(x(i) = x | y, y2 ) = α
= α λi (x)πi(1) (x)
x:x(i)=x
prob(y2 | c2 )
x:x(i)=x
prob(y2 | c2 )
m
λ j (x( j))π (1)
j (x( j))
j=1
m
λ j (x( j))π (1)
j (x( j)).
j=1
j=i
(The value of α above may be different from that in (15.31); again the exact value is
immaterial.) The extrinsic information
πi(2) (x) =
x:x(i)=x
prob(y2 | c2 )
m
λ j (x( j))π (1)
j (x( j))
j=1
j=i
from this decoder is passed back to the first decoder to use as a priori probabilities to begin
the first half of the second iteration.
Continuing in like manner, iteration I generates the following probabilities:
πi(2I −1) (x) =
πi(2I ) (x) =
x:x(i)=x
prob(y1 | c1 )
m
x:x(i)=x
prob(y2 | c2 )
m
−2)
λ j (x( j))π (2I
(x( j)),
j
j=1
j=i
−1)
λ j (x( j))π (2I
(x( j)).
j
j=1
j=i
By (15.33) and analogs equations,
prob(x(i) = x | y, y1 ) = α λi (x)πi(2I −2) (x)πi(2I −1) (x),
prob(x(i) = x | y, y2 ) = α λi (x)πi(2I −1) (x)πi(2I ) (x).
Solving these for πi(2I −1) (x), and πi(2I ) (x), respectively, yields
prob(x(i) = x | y, y )
1
α
if n = 2I − 1,
(n−1)
λ
(x)π
(x)
i
i
(n)
πi (x) =
x | y, y2 )
α prob(x(i) =
if n = 2I.
(n−1)
λi (x)πi
(x)
(15.34)
(Note that the α values are not the same as before.) Our original goal was to compute
(15.29). It turns out that this can be approximated after N complete iterations by
α λi (x)πi(2N ) (x)πi(2N −1) (x).
This leads to the Turbo Decoding Algorithm:
I. Until some number of iterations N , compute
πi(1) (x), πi(2) (x), πi(3) (x), . . . , πi(2N ) (x)
for 1 ≤ i ≤ m and x ∈ {0, 1} using (15.34).
II. For 1 ≤ i ≤ m and x ∈ {0, 1}, compute
γi (x) = α λi (x)πi(2N ) (x)πi(2N −1) (x).
611
15.9 Some space history
✲
y
✲ RSC1
decoder
✲
y1
π (0)
✛
(1)
(3)
(5)
✟ π ,π ,π ,...
✟
✟
✟✟
✐
❄
Convergence
detection
y
✲
Permuter
y2
✲
RSC2
✲ decoder
✛
✲
Decoded
information
symbols
✻
✐
π (2) , π (4) , π (6) , . . .
Figure 15.14 Turbo decoder.
III. The decoded message symbol
x(i), estimating x(i), is
0 if γi (0) > γi (1),
x(i) =
1 if γi (1) > γi (0),
for 1 ≤ i ≤ m.
The value of N is determined in a number of different ways. For example, the value
can simply be fixed, at say, N = 10 (see Figure 15.11). Or the sequence of probabilities
πi(1) (x), πi(2) (x), πi(3) (x), . . . can be examined to see how much variability there is with an
increasing number of iterations. The computation in Step I of πi(n) (x) using (15.34) involves
computing prob(x(i) = x | y, y1 ) and prob(x(i) = x | y, y2 ). These can often be computed
using versions of Two-Way APP Decoding as in Section 15.4. The Turbo Decoding Algorithm is illustrated in Figure 15.14.
15.9
Some space history
We conclude this chapter with a brief summary of the use of error-correcting codes in the
history of space exploration. A more thorough account can be found in [352].
The first error-correcting code developed for use in deep space was a nonlinear
(32, 64, 16) code, a coset of the binary [32, 6, 16] Reed–Muller code R(1, 5), capable
of correcting seven errors. This code was used on the Mariner 6 and 7 Mars missions
launched, respectively, on February 24 and March 27, 1969. Mariner 6 flew within 2131
miles of the Martian equator on July 30, 1969, and Mariner 7 flew within 2130 miles of the
southern hemisphere on August 4, 1969. The Mariner spacecrafts sent back a total of 201
gray-scale photographs with each pixel given one of 26 = 64 shades of gray. The code was
used on later Mariner missions and also on the Viking Mars landers. The coding gain using
this code was 2.2 dB compared to uncoded BPSK at a BER of 5 × 10−3 . While the chosen
code had a relatively low rate 6/32 among codes capable of correcting seven errors, it had
the advantage of having an encoder and decoder that were easy to implement. The decoder,
known as the Green machine, after its developer R. R. Green of NASA’s Jet Propulsion
612
Soft decision and iterative decoding
Laboratory (JPL), uses computations with Hadamard matrices to perform the decoding
chores.
While the Mariner code was the first developed for space application, it was not actually
the first to be launched. An encoding system based on a (2, 1, 20) convolutional code with
generator matrix G = [g1 (D) g2 (D)], where
g1 (D) = 1,
g2 (D) = 1 + D + D 2 + D 5 + D 6 + D 8 + D 9 + D 12
+ D 13 + D 14 + D 16 + D 17 + D 18 + D 19 + D 20 ,
was launched aboard Pioneer 9 on November 8, 1968, headed for solar orbit. Sequential
decoding, described in [197, 232, 351], was used to decode received data. Pioneer 9 continued to transmit information about solar winds, interplanetary electron density and magnetic
fields, cosmic dust, and the results of other on-going experiments until 1983. This code with
its sequential decoder provided a coding gain of 3.3 dB at a BER of 7.7 × 10−4 .
The Pioneer 10 mission to Jupiter launched in 1972 and the Pioneer 11 mission to Saturn
launched a year later carried a (2, 1, 31, 21) quick-look-in convolutional code with encoder
[g1 (D) g2 (D)] developed by Massey and Costello [228], where
g1 (D) = 1 + D + D 2 + D 4 + D 5 + D 7 + D 8 + D 9 + D 11
+ D 13 + D 14 + D 16 + D 17 + D 18 + D 19 + D 21 + D 22
+ D 23 + D 24 + D 25 + D 27 + D 28 + D 29 + D 31 ,
g2 (D) = g1 (D) + D.
A quick-look-in convolutional code is a (2, 1) code with encoder [g1 (D) g2 (D)], where
deg g1 (D) = deg g2 (D), the constant term of both g1 (D) and g2 (D) is 1, and g1 (D) +
g2 (D) = D. Exercise 867 shows why this is called a quick-look-in code.
Exercise 867 Let C be a quick-look-in (2, 1) convolutional code with encoder G =
[g1 (D) g2 (D)].
(a) Prove that C is basic and noncatastrophic. Hint: Any divisor of g1 (D) and g2 (D) is a
divisor of g1 (D) + g2 (D).
(b) Let x(D) be input to the code and c(D) = (c1 (D), c2 (D)) = x(D)G be the output. Show
that c1 (D) + c2 (D) = x(D)D.
Note that (b) shows the input can be obtained easily from the output by adding the two
components of the corrected received vector; it is merely delayed by one unit of time.
Hence the receiver can quickly look at the corrected vector to obtain the input. Note that
(b) also shows that C is noncatastrophic because if the output has finite weight, the input
must have finite weight as well.
The use of convolutional codes in deep space exploration became even more prevalent
with the development of the Viterbi Algorithm. Sequential decoding has the disadvantage
that it does not have a fixed decoding time. While the Viterbi Algorithm does not have this
disadvantage, it cannot be used efficiently when the memory is too high. In the mid 1980s,
the Consultative Committee for Space Data Systems (CCSDS) adopted as its standard a
613
15.9 Some space history
concatenated coding system with the inner encoder consisting of either a (2, 1, 6, 10) or
(3, 1, 6, 15) convolutional code. The (2, 1, 6, 10) code has generator matrix [g1 (D) g2 (D)],
where
g1 (D) = 1 + D + D 3 + D 4 + D 6 ,
g2 (D) = 1 + D 3 + D 4 + D 5 + D 6 .
The (3, 1, 6, 15) code has generator matrix [g1 (D) g2 (D) g3 (D)], where g1 (D) and g2 (D)
are as above and g3 (D) = 1 + D 2 + D 4 + D 5 + D 6 . At a BER of 1 × 10−5 , the (2, 1, 6, 10)
code provides a coding gain of 5.4 dB, and the (3, 1, 6, 15) code provides a coding gain
of 5.7 dB. Both codes can be decoded efficiently using the Viterbi Algorithm but the
(2, 1, 6, 10) code has become the preferred choice.
Voyager 1 and 2 were launched in the summer of 1977 destined to explore Jupiter, Saturn,
and their moons. These spacecraft transmitted two different types of data: imaging and GSE
(general science and engineering). The imaging system required less reliability than the GSE
system and used the (2, 1, 6, 10) code for its encoding. The GSE data was transmitted using
a concatenated code. The outer encoder was the [24, 12, 8] extended binary Golay code.
After encoding using the binary Golay code, codeword interleaving was performed with
the result passed to the inner encoder, which was the (2, 1, 6, 10) code. The process was
reversed in decoding. (The imaging data merely bypassed the Golay code portion of the
encoding and decoding process.) Partially because of the effective use of coding, NASA
engineers were able to extend the mission of Voyager 2 to include flybys of Uranus and
Neptune. To accomplish this, the GSE data was transmitted using a concatenated code with
outer encoder a [256, 224, 33] Reed–Solomon code over F256 , in place of the Golay code,
followed by the (2, 1, 6, 10) inner encoder. Later NASA missions, including Galileo and
some European Space Agency (ESA) missions such as the Giotto mission to Halley’s Comet,
used similar error-correction coding involving a Reed–Solomon code and a convolutional
code with the added step of interleaving after encoding with the Reed–Solomon code. The
coding gain for the Voyager coding system is shown in Figure 15.11.
The Galileo mission to study Jupiter and its atmosphere was fraught with problems from
the outset. Originally scheduled for launch from the space shuttle in early 1986, the mission
was delayed until late 1989 due to the shuttle Challenger in-flight explosion on January 28,
1986. Galileo was launched with two different concatenated encoders. The outer encoder
of both systems was the [256, 224, 33] Reed–Solomon code. The inner encoder for one
system was the (2, 1, 6, 10) CCSDS standard code, while the inner encoder for the other
was a (4, 1, 14) convolutional code with generator matrix [g1 (D) g2 (D) g3 (D) g4 (D)],
where
g1 (D) = 1 + D 3 + D 4 + D 7 + D 8 + D 10 + D 14 ,
g2 (D) = 1 + D 2 + D 5 + D 7 + D 9 + D 10 + D 11 + D 14 ,
g3 (D) = 1 + D + D 4 + D 5 + D 6 + D 7 + D 9 + D 10 + D 12 + D 13 + D 14 ,
g4 (D) = 1 + D + D 2 + D 6 + D 8 + D 10 + D 11 + D 12 + D 14 .
At each time increment, the Viterbi decoder for the (4, 1, 14) code had to examine 214 =
16 384 states. To accomplish this daunting task, JPL completed the Big Viterbi Decoder in
614
Soft decision and iterative decoding
1991 (over two years after Galileo’s launch!). This decoder could operate at one million bits
per second. In route to Jupiter, the high gain X-band antenna, designed for transmission of
Galileo’s data at a rate of 100 000 bits per second, failed to deploy. As a result a low gain
antenna was forced to take over data transmission. However, the information transmission
rate for the low gain antenna was only 10 bits per second. Matters were further complicated
because the encoding by the (4, 1, 14) inner encoder was hard-wired into the high gain
antenna. The JPL scientists were able to implement software changes to improve the data
transmission and error-correction capability of Galileo, which in turn have improved data
transmission in the CCSDS standard. In the end, the outer Reed–Solomon encoder originally
on Galileo was replaced with another (2, 1) convolutional encoder; the inner encoder used
was the original (2, 1, 6, 10) code already on board.
The Cassini spacecraft was launched on October 6, 1997, for a rendezvous with Saturn in
June 2004. Aboard is the ESA probe Huygens, which will be dropped into the atmosphere of
the moon Titan. Aboard Cassini are hardware encoders for the (2, 1, 6, 10) CCSDS standard
convolutional code, the (4, 1, 14) code that is aboard Galileo, and a (6, 1, 14) convolutional
code. A JPL team of Divsalar, Dolinar, and Pollara created a turbo code that has been placed
aboard Cassini as an experimental package. This turbo code uses two (2, 1) convolutional
codes in parallel. The turbo code is a modification of that given in Figure 15.12; the first code
uses the RSC encoder [1 g1 (D)/g2 (D)], where g1 (D) = 1 + D 2 + D 3 + D 5 + D 6 and
g2 (D) = 1 + D + D 2 + D 3 + D 6 . The second encoder is [g1 (D)/g3 (D) g2 (D)/g3 (D)],
where g3 (D) = 1 + D; this is not systematic but is recursive. On Earth, is a turbo decoder
that is much less complex than the Big Viterbi Decoder to be used with the (6, 1, 14)
convolutional code. Simulations have shown, as indicated in Figure 15.11, that the turbo
code outperforms the concatenated code consisting of an outer Reed–Solomon code and
the inner (6, 1, 14) code.
References
[1] M. J. Adams, “Subcodes and covering radius,” IEEE Trans. Inform. Theory IT–32 (1986),
700–701.
[2] E. Agrell, A. Vardy, and K. Zeger, “A table of upper bounds for binary codes,” IEEE Trans.
Inform. Theory IT–47 (2001), 3004–3006.
[3] E. Artin, Geometric Algebra. Interscience Tracts in Pure and Applied Mathematics No. 3. New
York: Interscience, 1957.
[4] E. F. Assmus, Jr. and J. D. Key, Designs and Their Codes. London: Cambridge University
Press, 1993.
[5] E. F. Assmus, Jr. and J. D. Key, “Polynomial codes and finite geometries,” in Handbook of
Coding Theory, eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 1269–
1343.
[6] E. F. Assmus, Jr. and H. F. Mattson, Jr., “New 5-designs,” J. Comb. Theory 6 (1969), 122–151.
[7] E. F. Assmus, Jr. and H. F. Mattson, Jr. “Some 3-error correcting BCH codes have covering
radius 5,” IEEE Trans. Inform. Theory IT–22 (1976), 348–349.
[8] E. F. Assmus, Jr., H. F. Mattson, Jr., and R. J. Turyn, “Research to develop the algebraic theory
of codes,” Report AFCRL-67-0365, Air Force Cambridge Res. Labs., Bedford, MA, June 1967.
[9] E. F. Assmus, Jr. and V. Pless, “On the covering radius of extremal self-dual codes,” IEEE
Trans. Inform. Theory IT–29 (1983), 359–363.
[10] D. Augot, P. Charpin, and N. Sendrier, “Studying the locator polynomials of minimum weight
codewords of BCH codes,” IEEE Trans. Inform. Theory IT–38 (1992), 960–973.
[11] D. Augot and L. Pecquet, “A Hensel lifting to replace factorization in list-decoding of algebraicgeometric and Reed–Solomon codes,” IEEE Trans. Inform. Theory IT–46 (2000), 2605–2614.
[12] C. Bachoc, “On harmonic weight enumerators of binary codes,” Designs, Codes and Crypt. 18
(1999), 11–28.
[13] C. Bachoc and P. H. Tiep, “Appendix: two-designs and code minima,” appendix to: W. Lempken,
B. Schröder, and P. H. Tiep, “Symmetric squares, spherical designs, and lattice minima,”
J. Algebra 240 (2001), 185–208; appendix pp. 205–208.
[14] A. Barg, “The matroid of supports of a linear code,” Applicable Algebra in Engineering,
Communication and Computing (AAECC Journal) 8 (1997), 165–172.
[15] B. I. Belov, “A conjecture on the Griesmer bound,” in Proc. Optimization Methods and Their
Applications, All Union Summer Sem., Lake Baikal (1972), 100–106.
[16] T. P. Berger and P. Charpin, “The automorphism group of BCH codes and of some affineinvariant codes over extension fields,” Designs, Codes and Crypt. 18 (1999), 29–53.
[17] E. R. Berlekamp, ed., Key Papers in the Development of Coding Theory. New York: IEEE
Press, 1974.
[18] E. R. Berlekamp, Algebraic Coding Theory. Laguna Hills, CA: Aegean Park Press, 1984.
[19] E. R. Berlekamp, F. J. MacWilliams, and N. J. A. Sloane, “Gleason’s theorem on self-dual
codes,” IEEE Trans. Inform. Theory IT–18 (1972), 409–414.
616
References
[20] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding
and decoding: turbo codes,” Proc. of the 1993 IEEE Internat. Communications Conf., Geneva,
Switzerland (May 23–26, 1993), 1064–1070.
[21] R. E. Blahut, Theory and Practice of Error Control Codes. Reading, MA: Addison-Wesley,
1983.
[22] R. E. Blahut, “Decoding of cyclic codes and codes on curves,” in Handbook of Coding Theory,
eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 1569–1633.
[23] I. F. Blake, ed., Algebraic Coding Theory: History and Development. Stroudsburg, PA: Dowden,
Hutchinson, & Ross, Inc., 1973.
[24] K. Bogart, D. Goldberg, and J. Gordon, “An elementary proof of the MacWilliams theorem on
equivalence of codes,” Inform. and Control 37 (1978), 19–22.
[25] A. Bonisoli, “Every equidistant linear code is a sequence of dual Hamming codes,” Ars Combin.
18 (1984), 181–186.
[26] A. Bonnecaze, P. Solé, C. Bachoc, and B. Mourrain, “Type II codes over Z4 ,” IEEE Trans.
Inform. Theory IT–43 (1997), 969–976.
[27] A. Bonnecaze, P. Solé, and A. R. Calderbank, “Quaternary quadratic residue codes and unimodular lattices,” IEEE Trans. Inform. Theory IT–41 (1995), 366–377.
[28] R. C. Bose and D. K. Ray-Chaudhuri, “On a class of error correcting binary group codes,”
Inform. and Control 3 (1960), 68–79. (Also reprinted in [17] pp. 75–78 and [23] pp. 165–176.)
[29] R. C. Bose and D. K. Ray-Chaudhuri, “Further results on error correcting binary group codes,”
Inform. and Control 3 (1960), 279–290. (Also reprinted in [17] pp. 78–81 and [23] pp. 177–
188.)
[30] S. Bouyuklieva, “A method for constructing self-dual codes with an automorphism of order 2,”
IEEE Trans. Inform. Theory IT–46 (2000), 496–504.
[31] A. E. Brouwer, “The linear programming bound for binary linear codes,” IEEE Trans. Inform.
Theory IT–39 (1993), 677–688.
[32] A. E. Brouwer, “Bounds on the size of linear codes,” in Handbook of Coding Theory, eds.
V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 295–461.
[33] R. A. Brualdi, N. Cai, and V. S. Pless, “Orphan structure of the first-order Reed–Muller codes,”
Discrete Math. 102 (1992), 239–247.
[34] R. A. Brualdi, S. Litsyn, and V. S. Pless, “Covering radius,” in Handbook of Coding Theory,
eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 755–826.
[35] R. A. Brualdi and V. Pless, “Subcodes of Hamming codes,” Congr. Numer. 70 (1990), 153–158.
[36] R. A. Brualdi and V. S. Pless, “Orphans of the first order Reed–Muller codes,” IEEE Trans.
Inform. Theory IT–36 (1990), 399–401.
[37] R. A. Brualdi and V. S. Pless, “On the covering radius of a code and its subcodes,” Discrete
Math. 83 (1990), 189–199.
[38] R. A. Brualdi and V. Pless, “Weight enumerators of self-dual codes,” IEEE Trans. Inform.
Theory IT–37 (1991), 1222–1225.
[39] R. A. Brualdi and V. S. Pless, “Greedy Codes,” J. Comb. Theory 64A (1993), 10–30.
[40] R. A. Brualdi, V. Pless, and J. S. Beissinger, “On the MacWilliams identities for linear codes,”
Linear Alg. and Its Applic. 107 (1988), 181–189.
[41] R. A. Brualdi, V. Pless, and R. M. Wilson, “Short codes with a given covering radius,” IEEE
Trans. Inform. Theory IT–35 (1989), 99–109.
[42] K. A. Bush,“Orthogonal arrays of index unity,” Ann. Math. Stat. 23 (1952), 426–434.
[43] A. R. Calderbank, “Covering radius and the chromatic number of Kneser graphs,” J. Comb.
Theory 54A (1990), 129–131.
617
References
[44] A. R. Calderbank, E. M. Rains, P. W. Shor, and N. J. A. Sloane, “Quantum error correction via
codes over GF(4),” IEEE Trans. Inform. Theory IT–44 (1998), 1369–1387.
[45] P. Camion, B. Courteau, and A. Monpetit, “Coset weight enumerators of the extremal self-dual
binary codes of length 32,” in Proc. of Eurocode 1992, Udine, Italy, CISM Courses and Lectures
No. 339, eds. P. Camion, P. Charpin, and S. Harari. Vienna: Springer, 1993, pp. 17–29.
[46] A. Canteaut and F. Chabaud, “A new algorithm for finding minimum weight codewords in
a linear code: application to primitive narrow-sense BCH codes of length 511,” IEEE Trans.
Inform. Theory IT–44 (1998), 367–378.
[47] M. G. Carasso, J. B. H. Peek, and J. P. Sinjou, “The compact disc digital audio system,” Philips
Technical Review 40 No. 6 (1982), 151–155.
[48] J. L. Carter, “On the existence of a projective plane of order 10,” Ph.D. Thesis, University of
California, Berkeley, 1974.
[49] G. Castagnoli, J. L. Massey, P. A. Schoeller, and N. von Seeman, “On repeated-root cyclic
codes,” IEEE Trans. Inform. Theory IT–37 (1991), 337–342.
[50] P. Charpin, “Open problems on cyclic codes,” in Handbook of Coding Theory, eds. V. S. Pless
and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 963–1063.
[51] G. D. Cohen, I. S. Honkala, S. Litsyn, and A. Lobstein, Covering Codes. Amsterdam: Elsevier,
1997.
[52] G. D. Cohen, A. C. Lobstein, and N. J. A. Sloane, “Further results on the covering radius of
codes,” IEEE Trans. Inform. Theory IT–32 (1986), 680–694.
[53] J. H. Conway and V. Pless, “On the enumeration of self-dual codes,” J. Comb. Theory 28A
(1980), 26–53.
[54] J. H. Conway and V. Pless, “On primes dividing the group order of a doubly-even (72, 36, 16)
code and the group of a quaternary (24, 12, 10) code,” Discrete Math. 38 (1982), 143–156.
[55] J. H. Conway, V. Pless, and N. J. A. Sloane, “Self-dual codes over GF(3) and GF(4) of length
not exceeding 16,” IEEE Trans. Inform. Theory IT–25 (1979), 312–322.
[56] J. H. Conway, V. Pless, and N. J. A. Sloane, “The binary self-dual codes of length up to 32: a
revised enumeration,” J. Comb. Theory 60A (1992), 183–195.
[57] J. H. Conway and N. J. A. Sloane, “Lexicographic codes: error-correcting codes from game
theory,” IEEE Trans. Inform. Theory IT–32 (1986), 337–348.
[58] J. H. Conway and N. J. A. Sloane, “A new upper bound on the minimal distance of self-dual
codes,” IEEE Trans. Inform. Theory IT–36 (1990), 1319–1333.
[59] J. H. Conway and N. J. A. Sloane, “Self-dual codes over the integers modulo 4,” J. Comb.
Theory 62A (1993), 30–45.
[60] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups, 3rd ed. New York:
Springer-Verlag, 1999.
[61] P. Delsarte, “Bounds for unrestricted codes by linear programming,” Philips Research Report
27 (1972), 272–289.
[62] P. Delsarte, “Four fundamental parameters of a code and their combinatorial significance,”
Inform. and Control 23 (1973), 407–438.
[63] P. Delsarte, “An algebraic approach to the association schemes of coding theory,” Philips
Research Reports Supplements No. 10 (1973).
[64] P. Delsarte, “On subfield subcodes of Reed–Solomon codes,” IEEE Trans. Inform. Theory
IT–21 (1975), 575–576.
[65] P. Delsarte and J. M. Goethals, “Unrestricted codes with the Golay parameters are unique,”
Discrete Math. 12 (1975), 211–224.
[66] L. E. Dickson, Linear Groups. New York: Dover, 1958.
618
References
[67] C. Ding and V. Pless, “Cyclotomy and duadic codes of prime lengths,” IEEE Trans. Inform.
Theory IT–45 (1999), 453–466.
[68] S. M. Dodunekov, “Zamechanie o vesovoy strukture porozhdayushchikh matrits lineĭnykh
kodov,” Prob. peredach. inform. 26 (1990), 101–104.
[69] S. M. Dodunekov and N. L. Manev, “Minimum possible block length of a linear code for some
distance,” Problems of Inform. Trans. 20 (1984), 8–14.
[70] S. M. Dodunekov and N. L. Manev, “An improvement of the Griesmer bound for some small
minimum distances,” Discrete Applied Math. 12 (1985), 103–114.
[71] R. Dougherty and H. Janwa, “Covering radius computations for binary cyclic codes,” Mathematics of Computation 57 (1991), 415–434.
[72] D. E. Downie and N. J. A. Sloane, “The covering radius of cyclic codes of length up to 31,”
IEEE Trans. Inform. Theory IT–31 (1985), 446–447.
[73] A. Drápal, “Yet another approach to ternary Golay codes,” Discrete Math. 256 (2002), 459–464.
[74] V. G. Drinfeld and S. G. Vlăduţ, “The number of points on an algebraic curve,” Functional
Anal. 17 (1993), 53–54.
[75] I. Dumer, “Concatenated codes and their multilevel generalizations,” in Handbook of Coding
Theory, eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 1911–1988.
[76] W. Ebeling, Lattices and Codes. Wiesbaden, Germany: Friedr Vieweg & Sohn Verlagsgesellschaft, 1994.
[77] P. Elias, “Error-free coding,” IRE Trans. Inform. Theory IT–4 (1954), 29–37. (Also reprinted
in [17] pp. 39–47.)
[78] P. Elias, “Coding for noisy channels,” 1955 IRE International Convention Record (part 4),
37–46. (Also reprinted in [17] pp. 48–55.)
[79] T. Etzion, A. Trachtenberg, and A. Vardy, “Which codes have cycle-free Tanner graphs?” IEEE
Trans. Inform. Theory IT–45 (1999), 2173–2181.
[80] W. Feit, “A self-dual even (96,48,16) code,” IEEE Trans. Inform. Theory IT–20 (1974), 136–
138.
[81] L. Fejes Tóth, “Über einen geometrischen Satz,” Math. Zeit. 46 (1940), 79–83.
[82] J. E. Fields, P. Gaborit, W. C. Huffman, and V. Pless, “On the classification of formally selfdual codes,” Proc. 36th Allerton Conf. on Commun. Control and Computing (September 23–25,
1998), 566–575.
[83] J. E. Fields, P. Gaborit, W. C. Huffman, and V. Pless, “On the classification of extremal even
formally self-dual codes,” Designs, Codes and Crypt. 18 (1999), 125–148.
[84] J. E. Fields, P. Gaborit, W. C. Huffman, and V. Pless, “On the classification of extremal even
formally self-dual codes of lengths 20 and 22,” Discrete Applied Math. 111 (2001), 75–86.
[85] J. Fields, P. Gaborit, J. Leon, and V. Pless, “All self-dual Z4 codes of length 15 or less are
known,” IEEE Trans. Inform. Theory IT–44 (1998), 311–322.
[86] G. D. Forney, Jr., “On decoding BCH codes,” IEEE Trans. Inform. Theory IT–11 (1965),
549–557. (Also reprinted in [17] pp. 136–144.)
[87] G. D. Forney, Jr., Concatenated Codes. Cambridge, MA: MIT Press, 1966.
[88] G. D. Forney, Jr., “Convolutional codes I: algebraic structure,” IEEE Trans. Inform. Theory
IT–16 (1970), 720–738.
[89] G. D. Forney, Jr., “Structural analysis of convolutional codes via dual codes,” IEEE Trans.
Inform. Theory IT–19 (1973), 512–518.
[90] G. D. Forney, Jr., “Minimal bases of rational vector spaces with applications to multivariable
linear systems,” SIAM J. Control 13 (1975), 493–502.
[91] G. D. Forney, Jr., “On iterative decoding and the two-way algorithm,” preprint.
619
References
[92] P. Gaborit, “Mass formulas for self-dual codes over Z4 and Fq + uFq rings,” IEEE Trans.
Inform. Theory IT–42 (1996), 1222–1228.
[93] P. Gaborit, W. C. Huffman, J.-L. Kim, and V. Pless, “On additive GF(4) codes,” in Codes and
Association Schemes (DIMACS Workshop, November 9–12, 1999), eds. A. Barg and S. Litsyn.
Providence, RI: American Mathematical Society, 2001, pp. 135–149.
[94] R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inform. Theory IT–8 (1962),
21–28.
[95] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA: MIT Press, 1963.
[96] A. Garcia and H. Stichtenoth, “A tower of Artin–Schreier extensions of function fields attaining
the Drinfeld–Vlăduţ bound,” Invent. Math. 121 (1995), 211–222.
[97] A. Garcia and H. Stichtenoth, “On the asymptotic behavior of some towers of function fields
over finite fields,” J. Number Theory 61 (1996), 248–273.
[98] E. N. Gilbert, “A comparison of signaling alphabets,” Bell System Tech. J. 31 (1952), 504–522.
(Also reprinted in [17] pp. 14–19 and [23] pp. 24–42.)
[99] D. Goedhart, R. J. van de Plassche, and E. F. Stikvoort, “Digital-to-analog conversion in playing
a compact disc,” Philips Technical Review 40 No. 6 (1982), 174–179.
[100] J.-M. Goethals, “On the Golay perfect binary code,” J. Comb. Theory 11 (1971), 178–186.
[101] J.-M. Goethals and S. L. Snover, “Nearly perfect codes,” Discrete Math. 3 (1972), 65–88.
[102] M. J. E. Golay, “Notes on digital coding,” Proc. IEEE 37 (1949), 657. (Also reprinted in [17]
p. 13 and [23] p. 9.)
[103] V. D. Goppa, “A new class of linear error-correcting codes,” Problems of Inform. Trans. 6
(1970), 207–212.
[104] V. D. Goppa, “Rational representation of codes and (L , g)-codes,” Problems of Inform. Trans.
7 (1971), 223–229.
[105] V. D. Goppa, “Codes associated with divisors,” Problems of Inform. Trans. 13 (1977), 22–26.
[106] V. D. Goppa, “Codes on algebraic curves,” Soviet Math. Dokl. 24 (1981), 170–172.
[107] D. M. Gordon, “Minimal permutation sets for decoding the binary Golay codes,” IEEE Trans.
Inform. Theory IT–28 (1982), 541–543.
[108] D. C. Gorenstein, W. W. Peterson, and N. Zierler, “Two error-correcting Bose–Chaudhury codes
are quasi-perfect,” Inform. and Control 3 (1960), 291–294.
[109] D. C. Gorenstein and N. Zierler, “A class of error-correcting codes in p m symbols,” J. SIAM 9
(1961), 207–214. (Also reprinted in [17] pp. 87–89 and [23] pp. 194–201.)
[110] T. Gosset, “On the regular and semi-regular figures in space of n dimensions,” Messenger Math.
29 (1900), 43–48.
[111] R. L. Graham and N. J. A. Sloane, “On the covering radius of codes,” IEEE Trans. Inform.
Theory IT–31 (1985), 385–401.
[112] J. H. Griesmer, “A bound for error-correcting codes,” IBM J. Research Develop. 4 (1960),
532–542.
[113] V. Guruswami and M. Sudan, “Improved decoding of Reed–Solomon and algebraic-geometry
codes,” IEEE Trans. Inform. Theory IT–45 (1999), 1757–1767.
[114] N. Hamada, “A characterization of some [n, k, d; q]-codes meeting the Griesmer bound using
a minihyper in a finite projective geometry,” Discrete Math. 116 (1993), 229–268.
[115] R. W. Hamming, Coding and Information Theory, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall,
1986.
[116] A. R. Hammons, P. V. Kumar, A. R. Calderbank, N. J. A. Sloane, and P. Solé, “The Z4 -linearity
of Kerdock, Preparata, Goethals, and related codes,” IEEE Trans. Inform. Theory IT–40 (1994),
301–319.
620
References
[117] C. R. P. Hartmann and K. K. Tzeng, “Generalizations of the BCH bound,” Inform. and Control
20 (1972), 489–498.
[118] C. Heegard and S. B. Wicker, Turbo Coding. Norwell, MA: Kluwer Academic Publishers,
1999.
[119] J. P. J. Heemskerk and K. A. S. Immink, “Compact disc: system aspects and modulation,”
Philips Technical Review 40 No. 6 (1982), 157–164.
[120] H. J. Helgert and R. D. Stinaff, “Minimum distance bounds for binary linear codes,” IEEE
Trans. Inform. Theory IT–19 (1973), 344–356.
[121] T. Helleseth, “All binary 3-error-correcting BCH codes of length 2m − 1 have covering radius
5,” IEEE Trans. Inform. Theory IT–24 (1978), 257–258.
[122] T. Helleseth, “Projective codes meeting the Griesmer bound,” Discrete Math. 107 (1992), 265–
271.
[123] T. Helleseth and T. Kløve, “The Newton radius of codes,” IEEE Trans. Inform. Theory IT–43
(1997), 1820–1831.
[124] T. Helleseth, T. Kløve, V. I. Levenshtein, and O. Ytrehus, “Bounds on the minimum support
weights,” IEEE Trans. Inform. Theory IT–41 (1995), 432–440.
[125] T. Helleseth, T. Kløve, and J. Mykkeltveit, “The weight distribution of irreducible cyclic codes
with block length n 1 ((q ℓ − 1)/N ),” Discrete Math. 18 (1977), 179–211.
[126] T. Helleseth, T. Kløve, and J. Mykkeltveit, “On the covering radius of binary codes,” IEEE
Trans. Inform. Theory IT–24 (1978), 627–628.
[127] T. Helleseth, T. Kløve, and O. Ytrehus, “Codes and the chain condition,” Proc. Int. Workshop
on Algebraic and Combinatorial Coding Theory (Voneshta Voda, Bulgaria June 22–28, 1992),
88–91.
[128] T. Helleseth, T. Kløve, and O. Ytrehus, “Generalized Hamming weights of linear codes,” IEEE
Trans. Inform. Theory IT–38 (1992), 1133–1140.
[129] T. Helleseth and H. F. Mattson, Jr., “On the cosets of the simplex code,” Discrete Math. 56
(1985), 169–189.
[130] I. N. Herstein, Abstract Algebra. New York: Macmillan, 1990.
[131] R. Hill and D. E. Newton, “Optimal ternary linear codes,” Designs, Codes and Crypt. 2 (1992),
137–157.
[132] A. Hocquenghem, “Codes correcteurs d’erreurs,” Chiffres (Paris) 2 (1959), 147–156. (Also
reprinted in [17] pp. 72–74 and [23] pp. 155–164.)
[133] H. Hoeve, J. Timmermans, and L. B. Vries, “Error correction and concealment in the compact
disc system,” Philips Technical Review 40 No. 6 (1982), 166–172.
[134] G. Höhn, “Self-dual codes over the Kleinian four group,” preprint, 1996. See also
http://xxx.lanl.gov/(math.CO/0005266).
[135] T. Høholdt, J. H. van Lint, and R. Pellikaan, “Algebraic geometry codes,” in Handbook of
Coding Theory, eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 871–
961.
[136] I. S. Honkala and H. O. Hämäläinen, “Bounds for abnormal binary codes with covering radius
one,” IEEE Trans. Inform. Theory IT–37 (1991), 372–375.
[137] I. Honkala and A. Tietäväinen, “Codes and number theory,” in Handbook of Coding Theory,
eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 1141–1194.
[138] J. A. van der Horst and T. Berger, “Complete decoding of triple-error-correcting binary BCH
codes,” IEEE Trans. Inform. Theory IT–22 (1976), 138–147.
[139] X. D. Hou, “On the covering radius of subcodes of a code,” IEEE Trans. Inform. Theory IT–37
(1991), 1706–1707.
621
References
[140] S. Houghten, C. Lam, and L. Thiel, “Construction of (48, 24, 12) doubly-even self-dual codes,”
Congr. Numer. 103 (1994), 41–53.
[141] S. K. Houghten, C. W. H. Lam, L. H. Thiel, and R. A. Parker, “The extended quadratic residue
code is the only (48, 24, 12) self-dual doubly-even code,” preprint.
[142] W. C. Huffman, “Automorphisms of codes with applications to extremal doubly even codes of
length 48,” IEEE Trans. Inform. Theory IT–28 (1982), 511–521.
[143] W. C. Huffman, “Decomposing and shortening codes using automorphisms,” IEEE Trans.
Inform. Theory IT–32 (1986), 833–836.
[144] W. C. Huffman, “On extremal self-dual quaternary codes of lengths 18 to 28, I,” IEEE Trans.
Inform. Theory IT–36 (1990), 651–660.
[145] W. C. Huffman, “On extremal self-dual quaternary codes of lengths 18 to 28, II,” IEEE Trans.
Inform. Theory IT–37 (1991), 1206–1216.
[146] W. C. Huffman, “On extremal self-dual ternary codes of lengths 28 to 40,” IEEE Trans. Inform.
Theory IT–38 (1992), 1395–1400.
[147] W. C. Huffman, “The automorphism groups of the generalized quadratic residue codes,” IEEE
Trans. Inform. Theory IT–41 (1995), 378–386.
[148] W. C. Huffman, “Characterization of quaternary extremal codes of lengths 18 and 20,” IEEE
Trans. Inform. Theory IT–43 (1997), 1613–1616.
[149] W. C. Huffman, “Codes and groups,” in Handbook of Coding Theory, eds. V. S. Pless and W.
C. Huffman. Amsterdam: Elsevier, 1998, pp. 1345–1440.
[150] W. C. Huffman, V. Job, and V. Pless, “Multipliers and generalized multipliers of cyclic codes
and cyclic objects,” J. Comb. Theory 62A (1993), 183–215.
[151] W. C. Huffman and V. D. Tonchev, “The existence of extremal self-dual [50, 25, 10] codes and
quasi-symmetric 2-(49, 9, 6) designs,” Designs, Codes and Crypt. 6 (1995), 97–106.
[152] W. C. Huffman and V. Y. Yorgov, “A [72, 36, 16] doubly even code does not have an automorphism of order 11,” IEEE Trans. Inform. Theory IT–33 (1987), 749–752.
[153] Y. Ihara, “Some remarks on the number of rational points of algebraic curves over finite fields,”
J. Fac. Sci. Univ. Tokyo Sect. IA Math. 28 (1981), 721–724.
[154] K. A. S. Immink, “Reed–Solomon codes and the compact disc,” in Reed–Solomon Codes and
Their Applications, eds. S. B. Wicker and V. K. Bhargava. New York: IEEE Press, 1994, pp.
41–59.
[155] N. Ito, J. S. Leon, and J. Q. Longyear, “Classification of 3-(24, 12, 5) designs and 24dimensional Hadamard matrices,” J. Comb. Theory 31A (1981), 66–93.
[156] D. B. Jaffe, “Optimal binary linear codes of length ≤ 30,” Discrete Math. 223 (2000), 135–155.
[157] G. J. Janusz, “Overlap and covering polynomials with applications to designs and self-dual
codes,” SIAM J. Discrete Math. 13 (2000), 154–178.
[158] R. Johannesson and K. Sh. Zigangirov, Fundamentals of Convolutional Coding. New York:
IEEE Press, 1999.
[159] S. M. Johnson, “A new upper bound for error-correcting codes,” IEEE Trans. Inform. Theory
IT–8 (1962), 203–207.
[160] J. Justesen, “A class of constructive aymptotically good algebraic codes,” IEEE Trans. Inform.
Theory IT–18 (1972), 652–656. (Also reprinted in [17] pp. 95–99 and [23] pp. 400–404.)
[161] T. Kasami, “An upper bound on k/n for affine-invariant codes with fixed d/n,” IEEE Trans.
Inform. Theory IT–15 (1969), 174–176.
[162] T. Kasami, S. Lin, and W. Peterson, “Some results on cyclic codes which are invariant under
the affine group and their applications,” Inform. and Control 11 (1968), 475–496.
[163] T. Kasami, S. Lin, and W. Peterson, “New generalizations of the Reed–Muller codes. Part I:
622
References
[164]
[165]
[166]
[167]
[168]
[169]
[170]
[171]
[172]
[173]
[174]
[175]
[176]
[177]
[178]
[179]
[180]
[181]
[182]
[183]
[184]
[185]
[186]
[187]
[188]
Primitive codes,” IEEE Trans. Inform. Theory IT–14 (1968), 189–199. (Also reprinted in [23]
pp. 323–333.)
G. T. Kennedy, “Weight distributions of linear codes and the Gleason–Pierce theorem,” J. Comb.
Theory 67A (1994), 72–88.
G. T. Kennedy and V. Pless, “On designs and formally self-dual codes,” Designs, Codes and
Crypt. 4 (1994), 43–55.
A. M. Kerdock, “A class of low-rate nonlinear binary codes,” Inform. and Control 20 (1972),
182–187.
J.-L. Kim, “New self-dual codes over GF(4) with the highest known minimum weights,” IEEE
Trans. Inform. Theory IT–47 (2001), 1575–1580.
J.-L. Kim, “New extremal self-dual codes of lengths 36, 38, and 58,” IEEE Trans. Inform.
Theory IT–47 (2001), 386–393.
H. Kimura, “New Hadamard matrix of order 24,” Graphs and Combin. 5 (1989), 235–242.
H. Kimura, “Classification of Hadamard matrices of order 28,” Discrete Math. 133 (1994),
171–180.
H. Kimura and H. Ohnmori, “Classification of Hadamard matrices of order 28,” Graphs and
Combin. 2 (1986), 247–257.
D. J. Kleitman, “On a combinatorial conjecture of Erdös,” J. Comb. Theory 1 (1966), 209–214.
M. Klemm, “Über die Identität von MacWilliams für die Gewichtsfunktion von Codes,” Archiv
Math. 49 (1987), 400–406.
M. Klemm, “Selbstduale Codes über dem Ring der ganzen Zahlen modulo 4,” Archiv Math. 53
(1989), 201–207.
T. Kløve, “Support weight distribution of linear codes,” Discrete Math. 107 (1992), 311–
316.
H. Koch, “Unimodular lattices and self-dual codes,” in Proc. Intern. Congress Math., Berkeley
1986, Vol. 1. Providence, RI: Amer. Math. Soc., 1987, pp. 457–465.
H. Koch, “On self-dual, doubly-even codes of length 32,” J. Comb. Theory 51A (1989), 63–76.
R. Kötter, “On algebraic decoding of algebraic-geometric and cyclic codes,” Ph.D. Thesis,
University of Linköping, 1996.
R. Kötter and A. Vardy, “Algebraic soft-decision decoding of Reed–Solomon codes,” preprint.
F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inform. Theory IT–47 (2001), 498–519.
C. W. H. Lam, “The search for a finite projective plane of order 10,” Amer. Math. Monthly 98
(1991), 305–318.
C. W. H. Lam, G. Kolesova, and L. Thiel, “A computer search for finite projective planes of
order 9,” Discrete Math. 92 (1991), 187–195.
C. W. H. Lam and V. Pless, “There is no (24, 12, 10) self-dual quaternary code,” IEEE Trans.
Inform. Theory IT–36 (1990), 1153–1156.
C. W. H. Lam, L. Thiel, and S. Swiercz, “The nonexistence of ovals in a projective plane of
order 10,” Discrete Math. 45 (1983), 319–321.
C. W. H. Lam, L. Thiel, and S. Swiercz, “The nonexistence of codewords of weight 16 in a
projective plane of order 10,” J. Comb. Theory 42A (1986), 207–214.
C. W. H. Lam, L. Thiel, and S. Swiercz, “The nonexistence of finite projective planes of order
10,” Canad. J. Math. 41 (1989), 1117–1123.
E. Lander, Symmetric Designs: An Algebraic Approach. London: Cambridge University Press,
1983.
J. Leech, “Notes on sphere packings,” Canadian J. Math. 19 (1967), 251–267.
623
References
[189] J. S. Leon, “A probabilistic algorithm for computing minimum weights of large error-correcting
codes,” IEEE Trans. Inform. Theory IT–34 (1988), 1354–1359.
[190] J. S. Leon, J. M. Masley, and V. Pless, “Duadic codes,” IEEE Trans. Inform. Theory IT–30
(1984), 709–714.
[191] J. S. Leon, V. Pless, and N. J. A. Sloane, “On ternary self-dual codes of length 24,” IEEE Trans.
Inform. Theory IT–27 (1981), 176–180.
[192] V. I. Levenshtein, “A class of systematic codes,” Soviet Math. Dokl. 1, No. 1 (1960), 368–
371.
[193] V. I. Levenshtein, “The application of Hadamard matrices to a problem in coding,” Problems
of Cybernetics 5 (1964), 166–184.
[194] V. I. Levenshtein, “Universal bounds for codes and designs,” in Handbook of Coding Theory,
eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 499–648.
[195] W. J. LeVeque, Topics in Number Theory. Reading, MA: Addison-Wesley, 1956.
[196] R. Lidl and H. Niederreiter, Finite Fields. Reading, MA: Addison-Wesley, 1983.
[197] S. Lin and D. J. Costello, Error-Control Coding – Fundamentals and Applications. Englewood
Cliffs, NJ: Prentice-Hall, 1983.
[198] S. Lin and E. J. Weldon, Jr., “Long BCH codes are bad,” Inform. and Control 11 (1967),
445–451.
[199] K. Lindström, “The nonexistence of unknown nearly perfect binary codes,” Ann. Univ. Turku
Ser. A. I 169 (1975), 7–28.
[200] K. Lindström, “All nearly perfect codes are known,” Inform. and Control 35 (1977), 40–47.
[201] J. H. van Lint, “Repeated root cyclic codes,” IEEE Trans. Inform. Theory IT–37 (1991), 343–
345.
[202] J. H. van Lint, “The mathematics of the compact disc,” Mitteilungen der Deutschen
Mathematiker-Vereinigung 4 (1998), 25–29.
[203] J. H. van Lint and R. M. Wilson, “On the minimum distance of cyclic codes,” IEEE Trans.
Inform. Theory IT–32 (1986), 23–40.
[204] J. H. van Lint and R. M. Wilson, A Course in Combinatorics. Cambridge: Cambridge University
Press, 1992.
[205] S. Litsyn, “An updated table of the best binary codes known,” in Handbook of Coding Theory,
eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 463–498.
[206] S. P. Lloyd, “Binary block coding,” Bell System Tech. J. 36 (1957), 517–535. (Also reprinted
in [17] pp. 246–251.)
[207] A. C. Lobstein and V. S. Pless, “The length function, a revised table,” in Lecture Notes in
Computer Science, No. 781. New York: Springer-Verlag, 1994, pp. 51–55.
[208] L. Lovász, “Kneser’s conjecture, chromatic number and homotopy,” J. Comb. Theory 25A
(1978), 319–324.
[209] E. Lucas, “Sur les congruences des nombres euleriennes et des coefficients différentiels des
fonctions trigonométriques, suivant un module premier,” Bull. Soc. Math. (France) 6 (1878),
49–54.
[210] C. C. MacDuffee, Theory of Equations. New York: Wiley & Sons, 1954.
[211] D. J. C. Mackay, “Good error correcting codes based on very sparse matrices,” IEEE Trans.
Inform. Theory IT–45 (1999), 399–431.
[212] F. J. MacWilliams, “Combinatorial problems of elementary abelian groups,” Ph.D. Thesis,
Harvard University, 1962.
[213] F. J. MacWilliams, “A theorem on the distribution of weights in a systematic code,” Bell System
Tech. J. 42 (1963), 79–94. (Also reprinted in [17] pp. 261–265 and [23] pp. 241–257.)
624
References
[214] F. J. MacWilliams, “Permutation decoding of systematic codes,” Bell System Tech. J. 43 (1964),
485–505.
[215] F. J. MacWilliams, C. L. Mallows, and N. J. A. Sloane, “Generalizations of Gleason’s theorem
on weight enumerators of self-dual codes,” IEEE Trans. Inform. Theory IT–18 (1972), 794–805.
[216] F. J. MacWilliams and H. B. Mann, “On the p-rank of the design matrix of a difference set,”
Inform. and Control 12 (1968), 474–488.
[217] F. J. MacWilliams, A. M. Odlyzko, N. J. A. Sloane, and H. N. Ward, “Self-dual codes over
GF(4),” J. Comb. Theory 25A (1978), 288–318.
[218] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes. New York:
Elsevier/North Holland, 1977.
[219] F. J. MacWilliams, N. J. A. Sloane, and J. G. Thompson, “Good self-dual codes exist,” Discrete
Math. 3 (1972), 153–162.
[220] F. J. MacWilliams, N. J. A. Sloane, and J. G. Thompson, “On the nonexistence of a finite
projective plane of order 10,” J. Comb. Theory 14A (1973), 66–78.
[221] C. L. Mallows, A. M. Odlyzko, and N. J. A. Sloane, “Upper bounds for modular forms, lattices,
and codes,” J. Algebra 36 (1975), 68–76.
[222] C. L. Mallows, V. Pless, and N. J. A. Sloane, “Self-dual codes over GF(3),” SIAM J. Applied
Mathematics 31 (1976), 649–666.
[223] C. L. Mallows and N. J. A. Sloane, “An upper bound for self-dual codes,” Inform. and Control
22 (1973), 188–200.
[224] J. L. Massey, “Shift-register synthesis and BCH decoding,” IEEE Trans. Inform. Theory IT–15
(1969), 122–127. (Also reprinted in [23] pp. 233–238.)
[225] J. L. Massey, “Error bounds for tree codes, trellis codes, and convolutional codes, with encoding
and decoding procedures,” Coding and Complexity, ed. G. Longo, CISM Courses and Lectures
No. 216. New York: Springer, 1977.
[226] J. L. Massey, “The how and why of channel coding,” Proc. 1984 Int. Zurich Seminar on Digital
Communications (1984), 67–73.
[227] J. L. Massey, “Deep space communications and coding: a match made in heaven,” Advanced
Methods for Satellite and Deep Space Communications, ed. J. Hagenauer, Lecture Notes in
Control and Inform. Sci. 182. Berlin: Springer, 1992.
[228] J. L. Massey and D. J. Costello, Jr., “Nonsystematic convolutional codes for sequential decoding
in space applications,” IEEE Trans. Commun. Technol. COM–19 (1971), 806–813.
[229] J. L. Massey and M. K. Sain, “Inverses of linear sequential circuits,” IEEE Trans. Comput.
C–17 (1968), 330–337.
[230] B. R. McDonald, Finite Rings with Identity. New York: Marcel Dekker, 1974.
[231] R. J. McEliece, “Weight congruences for p-ary cyclic codes,” Discrete Math. 3 (1972), 177–
192.
[232] R. J. McEliece, The Theory of Information and Coding. Reading, MA: Addison-Wesley, 1977.
[233] R. J. McEliece, Finite Fields for Computer Scientists and Engineers. Boston: Kluwer Academic
Publishers, 1987.
[234] R. J. McEliece, “On the BCJR trellis for linear block codes,” IEEE Trans. Inform. Theory IT–42
(1996), 1072–1092.
[235] R. J. McEliece, “The algebraic theory of convolutional codes,” in Handbook of Coding Theory,
eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 1065–1138.
[236] R. J. McEliece, E. R. Rodemich, H. Rumsey, Jr., and L. Welch, “New upper bounds on the
rate of a code via the Delsarte–MacWilliams inequalities,” IEEE Trans. Inform. Theory IT–23
(1977), 157–166.
625
References
[237] A. McLoughlin, “The covering radius of the (m − 3)rd order Reed–Muller codes and a lower
bound on the covering radius of the (m − 4)th order Reed–Muller codes,” SIAM J. Applied
Mathematics 37 (1979), 419–422.
[238] J. E. Meggitt, “Error-correcting codes for correcting bursts of errors,” IBM J. Res. Develop. 4
(1960), 329–334.
[239] J. E. Meggitt, “Error-correcting codes and their implementation,” IRE Trans. Inform. Theory
IT–6 (1960), 459–470.
[240] C. Moreno, Algebraic Curves over Finite Fields. Cambridge Tracts in Math. 97. Cambridge:
Cambridge University Press, 1991.
[241] D. E. Muller, “Application of Boolean algebra to switching circuit design and to error detection,”
IEEE Trans. Computers 3 (1954), 6–12. (Also reprinted in [17] pp. 20–26 and [23] pp. 43–49.)
[242] J. Mykkeltveit, “The covering radius of the (128, 8) Reed–Muller code is 56,” IEEE Trans.
Inform. Theory IT–26 (1980), 359–362.
[243] A. A. Nechaev, “The Kerdock code in a cyclic form,” Diskret. Mat. 1 (1989), 123–139. (English
translation in Discrete Math. Appl. 1 (1991), 365–384.)
[244] A. Neumeier, private communication, 1990.
[245] R. R. Nielsen and T. Høholdt, “Decoding Reed–Solomon codes beyond half the minimum
distance,” in Coding Theory, Cryptography, and Related Areas (Guanajunto, 1998), eds. J.
Buchmann, T. Høholdt, H. Stichtenoth, and H. Tapia-Recillas. Berlin: Springer, 2000, pp.
221–236.
[246] H. V. Niemeier, “Definite Quadratische Formen der Dimension 24 und Diskriminante 1,” J.
Number Theory 5 (1973), 142–178.
[247] A. W. Nordstrom and J. P. Robinson, “An optimum nonlinear code,” Inform. and Control 11
(1967), 613–616. (Also reprinted in [17] p. 101 and [23] pp. 358–361.)
[248] V. Olshevesky and A. Shokrollahi, “A displacement structure approach to efficient decoding of
Reed–Solomon and algebraic-geometric codes,” Proc. 31st ACM Symp. Theory of Computing
(1999), 235–244.
[249] J. Olsson, “On the quaternary [18, 9, 8] code,” Proceedings of the Workshop on Coding and
Cryptography, WCC99-INRIA Jan. 10–14, 1999, pp. 65–73.
[250] N. J. Patterson and D. H. Wiedemann, “The covering radius of the (215 , 16) Reed–Muller code
is at least 16 276,” IEEE Trans. Inform. Theory IT–29 (1983), 354–356.
[251] N. J. Patterson and D. H. Wiedemann, “Correction to ‘The covering radius of the (215 , 16)
Reed–Muller code is at least 16276’,” IEEE Trans. Inform. Theory IT–36 (1990), 443.
[252] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San
Mateo, CA: Morgan Kaufmann, 1988.
[253] J. B. H. Peek, “Communications aspects of the compact disc digital audio system,” IEEE
Communications Mag. 23 (1985), 7–15.
[254] W. W. Peterson, “Encoding and error-correction procedures for the Bose–Chaudhuri codes,”
IRE Trans. Inform. Theory IT–6 (1960), 459–470. (Also reprinted in [17] pp. 109–120 and [23]
pp. 221–232.)
[255] W. W. Peterson, Error-Correcting Codes. Cambridge, MA: MIT Press, 1961.
[256] W. W. Peterson and E. J. Weldon, Jr., Error-Correcting Codes, 2nd ed. Cambridge, MA.: MIT
Press, 1972.
[257] P. Piret, Convolutional Codes: An Algebraic Approach. Cambridge, MA: MIT Press, 1988.
[258] V. Pless, “Power moment identities on weight distributions in error correcting codes,”
Inform. and Control 6 (1963), 147–152. (Also reprinted in [17] pp. 266–267 and [23]
pp. 257–262.)
626
References
[259] V. Pless, “The number of isotropic subspaces in a finite geometry,” Rend. Cl. Scienze fisiche,
matematiche e naturali, Acc. Naz. Lincei 39 (1965), 418–421.
[260] V. Pless, “On the uniqueness of the Golay codes,” J. Comb. Theory 5 (1968), 215–228.
[261] V. Pless, “The weight of the symmetry code for p = 29 and the 5-designs contained therein,”
Annals N. Y. Acad. of Sciences 175 (1970), 310–313.
[262] V. Pless, “A classification of self-orthogonal codes over GF(2),” Discrete Math. 3 (1972),
209–246.
[263] V. Pless, “Symmetry codes over GF(3) and new 5-designs,” J. Comb. Theory 12 (1972), 119–
142.
[264] V. Pless, “The children of the (32, 16) doubly even codes,” IEEE Trans. Inform. Theory IT–24
(1978), 738–746.
[265] V. Pless, “23 does not divide the order of the group of a (72, 36, 16) doubly-even code,” IEEE
Trans. Inform. Theory IT–28 (1982), 112–117.
[266] V. Pless, “Q-Codes,” J. Comb. Theory 43A (1986), 258–276.
[267] V. Pless, “Cyclic projective planes and binary, extended cyclic self-dual codes,” J. Comb. Theory
43A (1986), 331–333.
[268] V. Pless, “Decoding the Golay codes,” IEEE Trans. Inform. Theory IT–32 (1986), 561–567.
[269] V. Pless, “More on the uniqueness of the Golay code,” Discrete Math. 106/107 (1992), 391–398.
[270] V. Pless, “Duadic codes and generalizations,” in Proc. of Eurocode 1992, Udine, Italy, CISM
Courses and Lectures No. 339, eds. P. Camion, P. Charpin, and S. Harari. Vienna: Springer,
1993, pp. 3–16.
[271] V. Pless, “Parents, children, neighbors and the shadow,” Contemporary Math. 168 (1994),
279–290.
[272] V. Pless, “Constraints on weights in binary codes,” Applicable Algebra in Engineering, Communication and Computing (AAECC Journal) 8 (1997), 411–414.
[273] V. Pless, Introduction to the Theory of Error-Correcting Codes, 3rd ed. New York: J. Wiley &
Sons, 1998.
[274] V. Pless, “Coding constructions,” in Handbook of Coding Theory, eds. V. S. Pless and W. C.
Huffman. Amsterdam: Elsevier, 1998, pp. 141–176.
[275] V. S. Pless, W. C. Huffman, and R. A. Brualdi, “An introduction to algebraic codes,” in Handbook
of Coding Theory, eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 3–139.
[276] V. Pless, J. Leon, and J. Fields, “All Z4 codes of Type II and length 16 are known,” J. Comb.
Theory 78A (1997), 32–50.
[277] V. Pless, J. M. Masley, and J. S. Leon, “On weights in duadic codes,” J. Comb. Theory 44A
(1987), 6–21.
[278] V. Pless and J. N. Pierce, “Self-dual codes over GF(q) satisfy a modified Varshamov–Gilbert
bound,” Inform. and Control 23 (1973), 35–40.
[279] V. Pless and Z. Qian, “Cyclic codes and quadratic residue codes over Z4 ,” IEEE Trans. Inform.
Theory IT–42 (1996), 1594–1600.
[280] V. Pless and J. J. Rushanan, “Triadic Codes,” Lin. Alg. and Its Appl. 98 (1988), 415–433.
[281] V. Pless and N. J. A. Sloane, “On the classification and enumeration of self-dual codes,”
J. Comb. Theory 18 (1975), 313–335.
[282] V. Pless, N. J. A. Sloane, and H. N. Ward, “Ternary codes of minimum weight 6 and the
classification of self-dual codes of length 20,” IEEE Trans. Inform. Theory IT–26 (1980),
305–316.
[283] V. Pless, P. Solé, and Z. Qian, “Cyclic self-dual Z4 -codes,” Finite Fields and Their Appl. 3
(1997), 48–69.
627
References
[284] V. Pless and J. Thompson, “17 does not divide the order of the group of a (72, 36, 16) doublyeven code,” IEEE Trans. Inform. Theory IT–28 (1982), 537–544.
[285] M. Plotkin, “Binary codes with specified minimum distances,” IRE Trans. Inform. Theory IT–6
(1960), 445–450. (Also reprinted in [17] pp. 238–243.)
[286] K. C. Pohlmann, Principles of Digital Audio, 4th ed. New York: McGraw-Hill, 2000.
[287] A. Pott, “Applications of the DFT to abelian difference sets,” Archiv Math. 51 (1988), 283–288.
[288] F. P. Preparata, “A class of optimum nonlinear double-error correcting codes,” Inform. and
Control 13 (1968), 378–400. (Also reprinted in [23] pp. 366–388.)
[289] Z. Qian, “Cyclic codes over Z4 ,” Ph.D. Thesis, University of Illinois at Chicago, 1996.
[290] E. M. Rains, “Shadow bounds for self-dual codes,” IEEE Trans. Inform. Theory IT–44 (1998),
134–139.
[291] E. M. Rains and N. J. A. Sloane, “Self-dual codes,” in Handbook of Coding Theory, eds. V. S.
Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 177–294.
[292] E. M. Rains and N. J. A. Sloane, “The shadow theory of modular and unimodular lattices,” J.
Number Theory 73 (1998), 359–389.
[293] I. S. Reed, “A class of multiple-error-correcting codes and the decoding scheme,” IRE Trans.
Inform. Theory IT–4 (1954), 38–49. (Also reprinted in [17] pp. 27–38 and [23] pp. 50–61.)
[294] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields,” J. SIAM 8 (1960),
300–304. (Also reprinted in [17] pp. 70–71 and [23] pp. 189–193.)
[295] T. J. Richardson and R. Urbanke, “The capacity of low-density parity-check codes under
message-passing decoding,” IEEE Trans. Inform. Theory IT–47 (2001), 599–618.
[296] C. Roos, “A generalization of the BCH bound for cyclic codes, including the Hartmann–Tzeng
bound,” J. Comb. Theory 33A (1982), 229–232.
[297] C. Roos, “A new lower bound on the minimum distance of a cyclic code,” IEEE Trans. Inform.
Theory IT–29 (1983), 330–332.
[298] R. M. Roth, personal communication.
[299] R. M. Roth and A. Lempel, “On MDS codes via Cauchy matrices,” IEEE Trans. Inform. Theory
IT–35 (1989), 1314–1319.
[300] R. M. Roth and G. Ruckenstein, “Efficient decoding of Reed–Solomon codes beyond half the
minimum distance,” IEEE Trans. Inform. Theory IT–46 (2000), 246–257.
[301] J. J. Rushanan, “Generalized Q-codes,” Ph.D. Thesis, California Institute of Technology, 1986.
[302] C. J. Salwach, “Planes, biplanes, and their codes,” American Math. Monthly 88 (1981), 106–125.
[303] N. V. Semakov and V. A. Zinov’ev, “Complete and quasi-complete balanced codes,” Problems
of Inform. Trans. 5(2) (1969), 11–13.
[304] N. V. Semakov and V. A. Zinov’ev, “Balanced codes and tactical configurations,” Problems of
Inform. Trans. 5(3) (1969), 22–28.
[305] N. V. Semakov, V. A. Zinov’ev, and G. V. Zaitsev, “Uniformly packed codes,” Problems of
Inform. Trans. 7(1) (1971), 30–39.
[306] C. Shannon, “A mathematical theory of communication,” Bell System Tech. J. 27 (1948), 379–
423 and 623–656.
[307] K. Shum, I. Aleshnikov, P. V. Kumar, and H. Stichtenoth, “A low complexity algorithm for
the construction of algebraic geometry codes better than the Gilbert–Varshamov bound,” Proc.
38th Allerton Conf. on Commun. Control and Computing (October 4–6, 2000), 1031–1037.
[308] J. Simonis, “On generator matrices of codes,” IEEE Trans. Inform. Theory IT–38 (1992), 516.
[309] J. Simonis, “The [18, 9, 6] code is unique,” Discrete Math. 106/107 (1992), 439–448.
[310] J. Simonis, “The effective length of subcodes,” Applicable Algebra in Engineering, Communication and Computing (AAECC Journal) 5 (1994), 371–377.
628
References
[311] J. Simonis, “Subcodes and covering radius: a generalization of Adam’s result,” unpublished.
[312] R. C. Singleton, “Maximum distance q-ary codes,” IEEE Trans. Inform. Theory IT–10 (1964),
116–118.
[313] N. J. A. Sloane, “Weight enumerators of codes,” in Combinatorics, eds. M. Hall, Jr. and J. H.
van Lint. Dordrecht, Holland: Reidel Publishing, 1975, 115–142.
[314] N. J. A. Sloane, “Relations between combinatorics and other parts of mathematics,” Proc.
Symposia Pure Math. 34 (1979), 273–308.
[315] M. H. M. Smid, “Duadic codes,” IEEE Trans. Inform. Theory IT–33 (1987), 432–433.
[316] K. J. C. Smith, “On the p-rank of the incidence matrix of points and hyperplanes in a finite
projective geometry,” J. Comb. Theory 7 (1969), 122–129.
[317] S. L. Snover, “The uniqueness of the Nordstrom–Robinson and the Golay binary codes,” Ph.D.
Thesis, Michigan State University, 1973.
[318] P. Solé, “A quaternary cyclic code, and a family of quadriphase sequences with low correlation
properties,” Lecture Notes in Computer Science 388 (1989), 193–201.
[319] G. Solomon and J. J. Stiffler, “Algebraically punctured cyclic codes,” Inform. and Control 8
(1965), 170–179.
[320] H. Stichtenoth, Algebraic Function Fields and Codes. New York: Springer-Verlag, 1993.
[321] L. Storme and J. A. Thas, “M.D.S. codes and arcs in PG(n, q) with q even: an improvement
on the bounds of Bruen, Thas, and Blokhuis,” J. Comb. Theory 62A (1993), 139–154.
[322] R. Struik, “On the structure of linear codes with covering radius two and three,” IEEE Trans.
Inform. Theory IT–40 (1994), 1406–1416.
[323] M. Sudan, “Decoding of Reed–Solomon codes beyond the error-correction bound,” J. Complexity 13 (1997), 180–193.
[324] Y. Sugiyama, M. Kasahara, S. Hirasawa, and T. Namekawa, “A method for solving a key
equation for decoding Goppa codes,” Inform. and Control 27 (1975), 87–99.
[325] R. M. Tanner, “A recursive approach to low-complexity codes,” IEEE Trans. Inform. Theory
IT–27 (1981), 533–547.
[326] L. Teirlinck, “Nontrivial t-designs without repeated blocks exist for all t,” Discrete Math. 65
(1987), 301–311.
[327] A. Thue, “Über die dichteste Zusammenstellung von kongruenten Kreisen in einer Ebene,”
Norske Vid. Selsk. Skr. No. 1 (1910), 1–9.
[328] H. C. A. van Tilborg, “On the uniqueness resp. nonexistence of certain codes meeting the
Griesmer bound,” Info. and Control 44 (1980), 16–35.
[329] H. C. A. van Tilborg, “Coding theory at work in cryptology and vice versa,” in Handbook of
Coding Theory, eds. V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 1195–1227.
[330] J. A. Todd, “A combinatorial problem,” J. Math. Phys. 12 (1933), 321–333.
[331] V. D. Tonchev, Combinatorial Configurations. New York: Longman-Wiley, 1988.
[332] V. D. Tonchev, “Codes and designs,” in Handbook of Coding Theory, eds. V. S. Pless and W.
C. Huffman. Amsterdam: Elsevier, 1998, pp. 1229–1267.
[333] M. A. Tsfasman, S. G. Vlăduţ, and T. Zink, “Modular curves, Shimura curves and Goppa codes,
better than Varshamov–Gilbert bound,” Math. Nachrichten 109 (1982), 21–28.
[334] J. V. Uspensky, Theory of Equations. New York: McGraw-Hill, 1948.
[335] S. A. Vanstone and P. C. van Oorschot, An Introduction to Error Correcting Codes with Applications. Boston: Kluwer Academic Publishers, 1989.
[336] A. Vardy, “Trellis structure of codes,” in Handbook of Coding Theory, eds. V. S. Pless and W.
C. Huffman. Amsterdam: Elsevier, 1998, pp. 1989–2117.
[337] R. R. Varshamov, “Estimate of the number of signals in error correcting codes,” Dokl. Akad.
Nauk SSSR 117 (1957), 739–741. (English translation in [23] pp. 68–71.)
629
References
[338] J. L. Vasil’ev, “On nongroup close-packed codes,” Probl. Kibernet. 8 (1962), 375–378. (English
translation in [23] pp. 351–357.)
[339] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Trans. Inform. Theory IT–13 (1967), 260–269. (Also reprinted in [17]
pp. 195–204.)
[340] S. G. Vlăduţ and A. N. Skorobogatov, “Covering radius for long BCH codes,” Problemy
Peredachi Informatsii 25(1) (1989) 38–45. (English translation in Problems of Inform. Trans.
25(1) (1989), 28–34.)
[341] L. B. Vries and K. Odaka, “CIRC – the error correcting code for the compact disc,” in Digital
Audio (Collected papers from the AES Premier Conference, Rye, NY, June 3–6, 1982), eds.
B. A. Blesser et al. Audio Engineering Society Inc., 1983, 178–186.
[342] B. Vucetic and J. Yuan, Turbo Codes: Principles and Applications. Norwell, MA: Kluwer
Academic Publishers, 2000.
[343] J. L. Walker, Codes and Curves. Student Mathematical Library Series 7. Providence, RI: American Mathematical Society, 2000.
[344] H. N. Ward, “A restriction on the weight enumerator of self-dual codes,” J. Comb. Theory 21
(1976), 253–255.
[345] H. N. Ward, “Divisible codes,” Archiv Math. (Basel) 36 (1981), 485–494.
[346] H. N. Ward, “Divisibility of codes meeting the Griesmer bound,” J. Comb. Theory 83A (1998),
79–93.
[347] H. N. Ward, “Quadratic residue codes and divisibility,” in Handbook of Coding Theory, eds.
V. S. Pless and W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 827–870.
[348] V. K. Wei, “Generalized Hamming weights for linear codes,” IEEE Trans. Inform. Theory
IT–37 (1991), 1412–1418.
[349] E. J. Weldon, Jr., “New generalizations of the Reed–Muller codes. Part II: Nonprimitive codes,”
IEEE Trans. Inform. Theory IT–14 (1968), 199–205. (Also reprinted in [23] pp. 334–340.)
[350] N. Wiberg, H.-A. Loeliger, and R. Kötter, “Codes and iterative decoding on general graphs,”
Euro. Trans. Telecommun. 6 (1995), 513–526.
[351] S. B. Wicker, Error Control Systems for Digital Communication and Storage. Englewood Cliffs,
NJ: Prentice-Hall, 1995.
[352] S. B. Wicker, “Deep space applications,” in Handbook of Coding Theory, eds. V. S. Pless and
W. C. Huffman. Amsterdam: Elsevier, 1998, pp. 2119–2169.
[353] H. A. Wilbrink, “A note on planar difference sets,” J. Comb. Theory 38A (1985), 94–95.
[354] X.-W. Wu and P. H. Siegel, “Efficient root-finding algorithm with applications to list decoding
of algebraic-geometric codes,” IEEE Trans. Inform. Theory IT–47 (2001), 2579–2587.
Symbol index
⊥, 5, 275
⊥H , 7
', 165
/,
a 533
p , 219
(g(x)), 125
g(x), 125, 481
x, y, 7
x, yT , 383
αq (δ), 89, 541
ŴAut(C), 26, 384
ŴAutPr (C), 28
Ŵ(L , G), 521
µ , 187
(n, k), 567
# (q), 426
2 , 423
24 , 429
∗ , 424
(C), 427
4 (C), 503
(x, y), 196
λi , 293
j
λi , 295
λi (x), 609
µa , 138
ν(C), 465
ν(P), 584
ν(si , s ′j ), 584
(n)
πi (x), 608, 610
ρ(C), 50, 432
ρBCH (t, m), 440
ρRM (r , m), 437
σ p , 111
σ (x), 181
σ (µ) (x), 187
φs , 164
φs : Fq [I] → I, 164
ω(x), 190
A , 309
AT , 4
Ai (C), 8, 252
A(n, d), 48
Aq (n, d), 48
Aq (n, d, w), 60
An (F), 517
Aut(P, B), 292
Bi (C), 75
B(n, d), 48
Bq (n, d), 48
B(G), 564
B(t, m), 439
C 1 ⊕ C 2 , 18
C 1 ≃ C 2 , 368
C i /C j , 353
C i ⊥C j , 353
C ∗ , 13
C|Fq , 116
C c , 145
C, 14
C (L) , 565
C ⊥ , 5, 469
C⊥H , 7
C ⊥T , 384
C q (A), 308
Cs , 114, 122
C T , 14
C T , 16
C(X , P, D), 535
d2m , 367
d f r ee , 563
dr (C), 283
Dq (A), 309
d(C, x), 65
d(x, y), 7
d E (x, y), 470
d H (x, y), 470
d L (x, y), 470
deg f (x), 101
det , 423
Dext , 249
e7 , 367
e8 , 367
E 8 , 428
E b , 577
E n , 209
E s , 573
evP , 534
631
Symbol index
extdeg G, 558
F2 , 3
F3 , 3
F4 , 3
F9 , 243
F16 , 184
Fq , 2, 100
Fqn , 3
Fq [x], 101
F[x1 , . . . , xn ], 519
f H , 519
f n , 373
f (x) | g(x), 102
G6, 7
G 11 , 33
G 12 , 32
G 23 , 32
G 24 , 31
g24 , 374
GA1 (I), 165
GA3 (2), 368
Gal(Fq ), 112
Gal(Fq : F pr ), 112
G(C), 371
G 0 (C), 372
G 1 (C), 372
G 2 (C), 371
G(C), 472
gcd( f (x), g(x)), 102
GF(q), 2, 100
GR(4r ), 505
GRSk (γ, v), 176, 196, 520
H3 , 5
3, 6
H
Hr , 29
H2,r , 29
H3,2 , 6
Hq,r , 29
Hq (x), 90
HamC (x, y), 470
i 2 , 367
I (C, t), 203
intdeg G, 559
Jv , 309
j(x), 209
K(r + 1), 509
n,q
K k (x), 75, 256
L(D), 534
Lk−1 , 520
ℓq (m, r ), 447
LeeC (x,y), 448
M12 , 419
M23 , 402
M24 , 402
MAut(C), 26, 469
MAutPr (C), 28
mC, 368
N0 , 575
N 16 , 68
n a (x), 470
(n, 2k , d), 383
(n, c, r ), 598
(n, k, m), 561
(n, k, m, d), 563
(n, M, d), 53
N p , 237
Ns (δ), 195
N (v), 426
ordn (q), 122
PAut(C), 22, 469
(P, B), 291
Pb , 578
Perr , 46
PG(2, 2s ), 319
PG(r − 1, q), 29
PGL2 (q), 422
P k , 174
P(r + 1), 515
prob(E 1 ), 39
prob(E 1 | E 2 ), 39
Pr(v), 407
PSL2 (7), 22, 368
PSL2 (23), 402
Q p , 237
Res(C), 496
Res(C, c), 80
Resγi f , 524
Rn , 121, 124, 209
Rn , 480
R(r, m), 34
Rq (r, m), 524
Rt , 577
span{x, y}, 275
S(q), 420
S(r, ν), 256
Sr (u), 40
S(t, k, v), 291
supp(c), 120
supp(D), 283
Symn , 21
tq (n, k), 447
Tor(C), 496
T (R), 506
TRr , 508
Trt , 119
Trt (C), 119
Vq (n, a), 74
WC (x), 255
WC (x, y), 255
wt(x), 8
wt E (x), 470
wt H (x), 470
632
Symbol index
wt L (x), 470
X 1 ∩ X 2 , 531
X f (F), 526
x j (i), 587
x⊥y, 275
x ∩ y, 8
x · y, 6, 469
Zq , 76
Subject index
a priori term, 609
α notation, 608
A-plane, 275
acceptable coordinate, 451
Adams Bound, 455
additive code over F4 , 383
automorphism group, 384
Balance Principle, 387
dodecacode, 388
equivalent, 384
generator matrix, 383
hexacode, 383
mass formula, 386
trace dual, 384
trace inner product, 383
trace self-dual, 384
trace self-orthogonal, 384
Type I, 385
extremal, 386
Type II, 385
extremal, 386
Gleason’s Theorem, 385
additive white Gaussian noise (AWGN), 575
adjoining a root, 108
affine group, 165, 251, 366, 368
affine plane curve, 526
affine space, 517
affine-invariant code, 162, 165
extended BCH, 172
AG, 535
algebraic geometry (AG) code C(X , P, D), 535
dimension, 535
dual, 541
exceed Asymptotic Gilbert–Varshamov Bound, 544
generalized Reed–Solomon code as, 537
generator matrix, 535
minimum distance, 535
Reed–Solomon code as, 536
algorithm
Berlekamp–Massey Decoding, 186, 188
Classification, 366
Division, 102
Euclidean, 102
Gallager Hard Decision Decoding, 599
General Viterbi, 584, 586
Meggitt Decoding, 158–160
Message Passing Decoding, 595
Permutation Decoding, 402, 403
Peterson–Gorenstein–Zierler Decoding, 179, 182
Soft Decision Viterbi Decoding, 580, 581, 612
Sudan–Guruswami Decoding, 195, 196
Sugiyama Decoding, 190, 191
Sum-Product Decoding, 602
Syndrome Decoding, 42, 43
Turbo Decoding, 610
Two-Way a Posteriori Probability (APP) Decoding,
587, 592
Viterbi Decoding, 551, 556
amalgamated direct sum (ADS), 452
ancestor, 460
APP, 587
Assmus–Mattson Theorem, 303
asymptotic bound, 88
Elias, 93
First MRRW, 94
Gilbert–Varshamov, 94, 541
exceeded by algebraic geometry codes, 544
met by Goppa codes, 542
Hamming, 92
Plotkin, 89
Second MRRW, 94
Singleton, 89
Tsfasman–Vlăduţ–Zink, 544
asymptotically bad code, 173, 541
asymptotically good code, 173, 542
automorphism group, 22, 26, 384
monomial, 26, 469
of a design, 292
permutation, 22, 469
transitive, 23, 28, 271, 308
automorphism of a design, 292
automorphism of a field, 111
fixed element, 112
Frobenius, 112
Galois group, 112
AWGN, 575
Balance Principle, 351, 379, 387
bandwidth, 577
basic generator matrix, 559
634
Subject index
basis of minimum weight codewords, 83, 85
BCH Bound, 151
BCH code, 168
Berlekamp–Massey Decoding Algorithm, 186, 188
Bose distance, 171
covering radius, 440, 441, 443, 444
designed distance, 168
dimension, 170
minimum distance, 171
narrow-sense, 168, 521
nested, 169
Peterson–Gorenstein–Zierler Decoding Algorithm,
179, 182
primitive, 168
affine-invariant extension, 172
Reed–Solomon code, see Reed–Solomon code
Sugiyama Decoding Algorithm, 190, 191
BER, 578
Berlekamp–Massey Decoding Algorithm, 186, 188
Bézout’s Theorem, 531
Big Viterbi Decoder, 613
binary adder, 129
binary field, 3
binary phase-shift keying (BPSK), 573
binary symmetric channel (BSC), 39, 583
crossover probability of, 39, 583
bit, 202
bit error rate (BER), 578
block, 291
bordered circulant matrix, 31, 376
bordered reverse circulant matrix, 377
bound, 48
Aq (n, d), 48
Aq (n, d, w), 60
Bq (n, d), 48
Adams, 455
asymptotic, see asymptotic bound
BCH, 151
Calderbank, 457
Delsarte, 440
Elias, 74
Generalized Griesmer, 287
Generalized Singleton, 286
Gilbert, 86
Griesmer, 81
Hamming, 48
Hartmann–Tzeng, 153
Hou, 458
Johnson, 65, 74
restricted, 61
unrestricted, 63
Linear Programming, 78
meet, 53
MRRW, 94
Norse, 435
on maximum distance separable code, 264
Plotkin, 58
Redundancy, 433
Singleton, 71
Sphere Covering, 434
Sphere Packing, 48, 59, 74
Square Root, 230
Supercode Lemma, 434
van Lint–Wilson Bounding Technique, 154
Varshamov, 88
BPSK, 573
Bruck–Ryser–Chowla Theorem, 319
BSC, 39
burst, 202
byte, 202
Calderbank Bound, 457
canonical generator matrix, 558
Cassini, 602, 614
catastrophic generator matrix, 569
CCSDS, 612
CD, 203
Challenger, 613
channel, 1, 573
binary symmetric, 39, 583
capacity of, 1, 47, 577
discrete memoryless, 39
noisy, 1
statistics, 576, 587
channel capacity, 1, 47, 577
characteristic, 100
child, 358, 375, 460
CIRC, 204
circulant matrix, 376
Classification Algorithm, 366
classification problem, 365
clock cycle, 129, 549, 573
code, 3
additive, see additive code over F4
affine-invariant, see affine-invariant code
algebraic geometry, see algebraic geometry (AG)
code
asymptotically bad, 173, 541
asymptotically good, 173, 542
automorphism group, 26
BCH, see BCH code
binary, 3
block, 546
bordered double circulant construction, 376
bordered double circulant generator matrix, 376
bordered reverse circulant construction, 377
bordered reverse circulant generator matrix, 377
burst error-correcting, 202
complement of, 145
component, 370
concatenated, 201
constant weight, 60, 282
635
Subject index
convolutional, see convolutional code
covering radius of, see covering radius
cyclic, see cyclic code
decomposable, 368
direct sum, 18
weight enumerator of, 255
divisible, 11
divisor of, 11, 86, 157
double circulant construction, 376
double circulant generator matrix, 132, 376
doubly-even, 12, 150, 361
duadic, see duadic code
dual, 5, 469
equivalent, 25
even, 11
even-like, 12, 210
extended, 14
extremal, 346
formally self-dual, see formally self-dual code
generalized Hamming weight, see generalized
Hamming weight
generator matrix, 4
standard form of, 4, 21
Golay, binary, see Golay codes, binary
Golay, ternary, see Golay codes, ternary
Goppa, see Goppa code
Hamming, see Hamming code
Hermitian self-dual, see Hermitian self-dual code
Hermitian self-orthogonal, 7
hexacode, see hexacode
hold a design, 293
homogeneous, 271
hull of, 275
indecomposable, 368
information set, 4
inner, 201
interleaved, 203
isodual, 378
Kerdock, see Kerdock code
lattice from, 427, 503
lexicode, 97
linear, 2, 4
low density parity check, see low density parity
check code
maximum distance separable, see maximum
distance separable code
minimum distance, 8
minimum support weight, see generalized
Hamming weight
minimum weight, 8
monomial automorphism group, 26
monomially equivalent, 24, 281
nearly perfect, binary, 69
nonlinear, 53
Nordstrom–Robinson, see Nordstrom–Robinson
code
normal, 452
odd-like, 12, 210
optimal, 53
orthogonal, 5
orthogonal sum, 276
outer, 201
packing radius of, 41
parallel concatenated convolutional, 604
parity check matrix, 4
perfect, 48, 49
permutation automorphism group, 22
permutation equivalent, 20
Pless symmetry, see Pless symmetry code
Preparata, see Preparata code
punctured, 13
quadratic residue, see quadratic residue code
quasi-cyclic, 131
quasi-perfect, 50
quaternary, 3
quick-look-in, 612
rate of, 47
redundancy set, 4
Reed–Muller, see Reed–Muller code
Reed–Solomon, see Reed–Solomon code
repetition, 4
replicated, 390
residual, 80
residue, 496
reverse circulant construction, 377
reverse circulant generator matrix, 377
self-complementary, 435
self-dual, see self-dual code
self-orthogonal, 6, 310, 340, 360, 363, 469
shortened, 16
simplex, see simplex code
singly-even, 12
strength of, 435
subfield subcode, see subfield subcode
sum, 135
Tanner graph of, 593
ternary, 3
t-error-correcting, 41
tetracode, see tetracode
torsion, 496
trace of, see trace code
turbo, see turbo code
weight distribution of, see weight distribution
weight enumerator of, see weight enumerator
weight hierarchy of, 283, 284
weight spectrum of, see weight distribution
Z4 -linear, see Z4 -linear code
codeword, 3
codeword associated with a path, 555
codeword error, 569
coding gain, 578
commutative ring with unity, 101
636
Subject index
compact disc (CD) recorder, 203
decoding, 207
encoding, 204
complement of a vector, 333
complete coset weight distribution, 45
complete decoding, 40
component, 370
conjugate, 7
conjugate elements, 114
conjugation, 7
constant weight code, 60, 282
constraint length, 561
Construction A, 427
Construction A4 , 503
Consultative Committee for Space Data Systems,
612
convolutional code, 546, 612
basic generator matrix, 559
canonical generator matrix, 558
catastrophic generator matrix, 569
constraint length, 561
degree of, 558
delay, 548
encoder, 547
recursive systematic, 607
standard form of, 607
state of, 551–553
systematic, 607
external degree, 558
Forney indices, 561
free distance, 563
General Viterbi Algorithm, 584, 586
generator matrix, 547
systematic, 607
internal degree, 559
memory, 546, 548, 553, 561
overall constraint length, 561
polynomial generator matrix, 547
predictable degree property, 559
quick-look-in, 612
rate of, 546
reduced generator matrix, 559
Soft Decision Viterbi Decoding Algorithm, 580,
581, 612
state diagram, 551, 552
trellis diagram, 554
codeword associated with a path, 555
message associated with a path, 555
survivor path, 555
truncated, 555
weight of a path, 555
weight of an edge, 555
Two-Way a Posteriori Probability (APP) Decoding
Algorithm, 587, 592
Viterbi Decoding Algorithm, 551, 556
coordinate functional, 390
coset, 41
complete weight distribution of, 45
cyclotomic, 114, 122
leader, see coset leader
of nonlinear code, 433
weight distribution of, 265
weight of, 41
coset leader, 42, 51, 434
ancestor, 460
child, 460
descendant, 460
orphan, 460
parent, 460
cover, 459
covering radius ρ(C), 50, 51, 57, 265, 432
Adams Bound, 455
amalgamated direct sum (ADS), 452
BCH code, 440, 441, 443, 444
Calderbank Bound, 457
Delsarte Bound, 440
Hamming code, 448
Hou Bound, 458
length function, 447
Norse Bounds, 435
Redundancy Bound, 433
Reed–Muller code, 437, 438
self-dual code, 444
simplex code, 439
Sphere Covering Bound, 434
subcode, 454
Supercode Lemma, 434
Cross-Interleaved Reed–Solomon Code (CIRC),
204
crossover probability, 39, 583
cyclic code, 121
BCH, see BCH code
BCH Bound, 151
check polynomial, 146
complement, 145, 211
defining set, 145
generating idempotent, 145
generator polynomial, 145
cyclic shift, 121
defining set, 142, 144
divisor of, 157
duadic, see duadic code
dual, 127, 146
defining set, 146
generating idempotent, 146
generator polynomial, 146
nonzeros, 146
encoding, 128
equivalence, 141
class, 144, 233, 365
extended, 162, 229
defining set, 164
637
Subject index
generating idempotent, 132, 135
computing from generator polynomial, 133
generator matrix from, 134
generator matrix, 125
generator polynomial, 126, 135, 144
computing from generating idempotent, 133
Hartmann–Tzeng Bound, 153
Hermitian dual, 149
defining set, 149
generating idempotent, 149
Hermitian self-orthogonal, 149
irreducible, 150
Meggitt Decoding Algorithm, 158–160
minimum weight, 151, 153
nonzeros, 142
over Z4 , see cyclic code over Z4
parity check matrix, 127, 143
permutation automorphism group of, 139
primitive, 162
primitive idempotent, 136
quadratic residue, see quadratic residue code
self-orthogonal, 147
defining set, 147
generator polynomial, 148
subfield subcode, 128
van Lint–Wilson Bounding Technique, 154
zeros, 142
cyclic code over Z4 , 475
dual, 485
generating idempotents, 486
generator polynomials, 485
generating idempotents, 485–487
generator polynomials, 482
quadratic residue, 490, 492
extended, 492
Leech lattice from, 505
self-dual, 502
cyclotomic coset, 114, 122
q-, 114
decoding, 39
Berlekamp–Massey Algorithm, 186, 188
compact disc, 207
complete, 40
erasure, 44
Gallager Hard Decision Algorithm, 599
General Viterbi Algorithm, 584, 586
hard decision, 573
hexacode decoding of Golay code, 407
iterative, 593, 598, 599, 602, 607, 610
list-decoding, 195
maximum a posteriori probability (MAP), 39
maximum likelihood (ML), 40, 580
Meggitt Algorithm, 158–160
Message Passing Algorithm, 595
nearest neighbor, 40
Permutation Decoding Algorithm, 402, 403
Peterson–Gorenstein–Zierler Algorithm, 179, 182
soft decision, 573
Soft Decision Viterbi Algorithm, 580, 581, 612
Sudan–Guruswami Algorithm, 195, 196
Sugiyama Algorithm, 190, 191
Sum-Product Algorithm, 602
Syndrome Decoding Algorithm, 42, 43
Turbo Algorithm, 610
turbo code, 607
Two-Way a Posteriori Probability (APP), 587, 592
Viterbi Algorithm, 551, 556
degree of a convolutional code, 558
degree of a point, 528
degree of a row, 553, 558
degree of a vector, 559
delay, 548
delay element, 129
Delsarte Bound, 440
Delsarte’s Theorem, 119
demodulation, 207
demodulator, 574
descendant, 460
design, 291
Assmus–Mattson Theorem, 303
automorphism group of, 292
automorphism of, 292
balanced incomplete block design, 294
block of, 291
code from, 308
complementary, 298, 302
derived, 298, 302
equivalent, 292
extended Pascal triangle, 297
Hadamard, 334
held by a code, 293
held by binary Golay code, 299, 300, 305, 306, 401
octad, 300
sextet, 300
tetrad, 299
held by duadic code, 321
held by extremal code, 349
held by Hamming code, 292, 293, 297, 299, 306
held by Pless symmetry code, 421
held by Reed–Muller code, 306
held by ternary Golay code, 305
incidence matrix, 291
j
intersection numbers λi , 295
Pascal triangle, 296
point of, 291
projective plane, see projective plane
quasi-symmetric, 314
replication number, 294
residual, 299, 302
self-complementary, 298
simple, 291
638
Subject index
design (cont.)
Steiner system, 291
symmetric, 291
automorphism of, 321, 322
Bruck–Ryser–Chowla Theorem, 319
code from, 309, 310
order of, 291
difference set, 322
cyclic, 322
development of, 323
multiplier, 327
normalized block, 328
symmetric design, 323
direct sum, 18
discrepancy, 187
discrete memoryless channel (DMC), 39,
576
8-ary output, 576
distance, 7, 563
minimum, 8
relative, 89, 541
distance distribution, 75, 472
distance invariant, 472
divisible code, 11, 338
Gleason–Pierce–Ward Theorem, 339, 389
Division Algorithm, 102
divisor, 11, 86, 157, 338
divisor of a rational function, 533
divisor on a curve, 531
degree of, 531
effective, 531
intersection, 531
support, 531
DMC, 39
dodecacode, 388
doubly-even code, 12, 361
number of, 361
doubly-even vector, 275, 277
duadic code, 209
codeword weights, 229
dual, 223
even-like, 210
existence, 220, 222
extended, 226
generating idempotent, 210, 233
Hermitian dual, 223, 224
Hermitian self-orthogonal, 223
minimum weight, 231, 233, 234
odd-like, 210
quadratic residue, see quadratic residue code
self-orthogonal, 222, 224, 321
splitting, 210
splitting of n, 212
Square Root Bound, 230
dual code, 5, 469
Hermitian, 7
eight-to-fourteen modulation (EFM), 206
Elias Bound, 74
asymptotic, 93
elliptic curve, 528–533, 538
encoder, 1, 547
standard form of, 607
systematic, 37, 607
encoding, 37
compact disc, 204
cyclic code, 128
energy of a signal, 573
energy per bit, 577
entropy, 90
equivalence class, 144, 233, 365, 518
equivalent codes, 25, 384
monomially, 24, 281, 469
permutation, 20, 468
equivalent designs, 292
equivalent Hadamard matrices, 331
equivalent lattices, 424
erasure, 44
error, 44
burst, 200
error evaluator polynomial, 190
error function, 575
error location number, 180
error locator polynomial, 181, 196
error magnitude, 180
error vector, 1, 40, 179
Euclidean Algorithm, 102
Euclidean weight, 470
Euler φ-function, 105, 141, 217
Euler totient, 105, 217
evaluation map, 175, 534
even code, 11
even-like code, 12, 210
even-like vector, 12, 209
extend, 14
extended code, 14
Golay G 12 , 33
Golay G 24 , 32
Hamming, 15, 29
external degree, 558
extremal code, 346, 386
extrinsic term, 609
Fermat curve, 526, 527, 529, 539
field, 2, 100
adjoining a root, 108
automorphism, 111
binary, 3
characteristic of, 100
conjugate elements, 114
extension, 111
finite, 100
order of, 100
639
Subject index
Frobenius automorphism, 112
Galois group, 112
normal basis, 441
prime subfield, 100
primitive element, 104
quaternary, 3
splitting, 122
subfield, 110
ternary, 3
trace function Trt , 119
field of rational functions, 533
finite weight matrix, 570
First MRRW Bound, 94
fixed element, 112
flip-flop, 129
flow along a path, 584
flow between vertices, 584
form H, O, or A, 277, 393
formally self-dual code, 307, 338
Balance Principle, 379
extremal, 346
isodual, 378
minimum distance, 344, 345
Forney indices, 561
frame, 204
free distance, 563
free module, 467
basis, 467
Frobenius automorphism, 112, 528
Galileo, 613
Gallager Hard Decision Decoding Algorithm, 599
Galois field, see field
Galois group, 112, 506
Galois ring, 505
2-adic representation, 506
Frobenius automorphism, 506
Galois group, 506
primitive element, 506
primitive polynomial, 507
General Viterbi Algorithm, 584, 586
Generalized Griesmer Bound, 287
generalized Hamming weight, 289
generalized Hamming weight dr (C), 283
dual code, 284
extended binary Golay code, 285
Generalized Griesmer Bound, 289
simplex code, 283, 290
generalized Reed–Muller (GRM) code Rq (r, m), 524
order of, 524
generalized Reed–Solomon (GRS) code GRSk (γ, v),
176, 520
as algebraic geometry code, 537
as MDS code, 176, 178
dual, 176
extended, 177
extended narrow sense RS, 176
generator matrix, 177
parity check matrix, 177
Sudan–Guruswami Decoding Algorithm, 195, 196
Generalized Singleton Bound, 286
generator matrix, 4
of code, 4
of lattice, 423
standard form of, 4, 21, 469
systematic, 607
genus of a curve, 532
Gilbert Bound, 86
asymptotic, 94, 541
Giotto, 613
Gleason’s Theorem, 341, 385
Gleason–Pierce–Ward Theorem, 339, 389
Gleason–Prange Theorem, 249
glue element, 370
glue vector, 370
Golay codes, binary G 23 , G 24 , 32, 49, 397, 613
as duadic code, 211
as lexicode, 99
as quadratic residue code, 240, 401
automorphism group, 251
complete coset weight distribution of, 306
covering radius, 402
decoding with the hexacode, 407
design held by, 299, 300, 305, 306, 401
extended, 32
generating idempotent of, 401
generator polynomial of, 401
Leech lattice from G 24 , 429
octad, 300, 429
odd Golay code, 358
PD-set of, 404
perfect, 48, 49
permutation decoding, 404
sextet, 300, 429
tetrad, 299, 429
uniqueness of, 49, 401
weight enumerator of, 261, 272, 302
Golay codes, ternary G 11 , G 12 , 33, 49, 413
as duadic code, 213
as quadratic residue code, 243, 418
automorphism group, 419
covering radius, 419
design held by, 305
extended, 33
generating idempotent of, 134, 418
generator polynomial of, 134, 418
perfect, 48, 49
uniqueness of, 49, 418
weight enumerator of, 260, 272
Goppa code Ŵ(L , G), 521
meets Asymptotic Gilbert–Varshamov Bound, 542
parity check matrix, 522
640
Subject index
Graeffe’s method, 479
Gram matrix, 423, 504
graph, 456
chromatic number, 456
coloring of, 456
edge, 456
Kneser, 456
Tanner, 593
vertex, 456
Gray map, 472
Green machine, 611
Griesmer Bound, 81
generalized, 287
generalized Hamming weight, 289
group, 22
affine, 165, 251, 366, 368
automorphism, 22, 26, 139, 292, 384,
469
Galois, 112, 506
Mathieu, 251, 402, 419
of units, 217
order of an element, 105
projective general linear, 422
projective special linear, 22, 249, 251,
402
symmetric, 22
transitive, 23
GRS, 176
H-plane, 275
Hadamard matrix, 331, 612
design from, 334
equivalent, 331
normalized, 331
order of, 331
Reed–Muller code, 333
Hamming Bound, 48
asymptotic, 92
Hamming code Hr or Hq,r , 5, 29
as BCH code, 169
as duadic code, 211
as lexicode, 99
automorphism group of H3 , 22
3 , 251
automorphism group of H
3 , 45,
complete coset weight distribution of H
266
design held by, 292, 293, 297, 299, 306
dual of, 30
generating idempotent of H3 , 133, 142
generator polynomial of H3 , 133, 142
3 , 428
Gosset lattice from H
Syndrome Decoding Algorithm, 43
weight distribution of, 261
word error rate of H3 , 46
Hamming distance distribution, see distance
distribution
hard decision decoding, 573
Hartmann–Tzeng Bound, 153
Hensel’s Lemma, 477
Hermitian curve, 527, 529, 538, 539
Hermitian dual, 7
Hermitian inner product, 7
Hermitian self-dual code, 7, 26, 228, 234, 245, 338,
344
Classification Algorithm, 367
design from, 349
Gleason’s Theorem, 341
mass formula, 366
minimum distance, 344, 345
number of, 362
Type IV, 339, 362
extremal, 346
hexacode G 6 , 7, 383, 405
as duadic code, 214
as extended Hamming code, 406
as quadratic residue code, 240
automorphism group of, 27, 406
covering radius, 406
to decode the extended binary Golay code, 407
uniqueness of, 405
Hilbert entropy function Hq (x), 90, 541
homogeneous coordinates, 518
homogenization, 519
Hou Bound, 458
hull of a code, 275
Huygens, 614
ideal, 106
minimal, 136
primary, 476
principal, 106, 125
idempotent, 132
generating, 132, 485, 487
primitive, 136
incidence matrix, 291
independent sequence, 154
information rate, 47, 88
information set, 4, 13, 71
information transmission rate, 577
inner distribution, 75
integral domain, 101
principal ideal domain, 106
interleave to depth t, 203
interleaver, 604
internal degree, 559
intersection multiplicity, 529
iterative decoding, 593, 598, 599, 602, 607,
610
Jet Propulsion Laboratory, 612
Johnson Bound, 65, 74
JPL, 612
641
Subject index
Kerdock code K(r + 1), 509
weight distribution, 513
key equation, 191
Klein quartic, 527, 529, 540
n,q
Krawtchouck polynomial K k (x), 75, 256
Lagrange Interpolation Formula, 537
land, 204
lattice, 423, 503
basis, 423
Construction A, 427
Construction A4 , 503
density, 428
determinant, 423
discriminant, 423
dual, 424
equivalent, 424
from code, 427, 503
fundamental parallelotope, 423
generator matrix, 423, 427, 504
Gosset, 428, 505
Gram matrix, 423, 504
integral, 423
even, 425
odd, 425
self-dual, 424
Type I, 425, 427, 504
Type II, 425, 427, 504
unimodular, 424
kissing number, 426
Leech, 429, 505
norm, 425
packing, 426
planar hexagonal, 423
point of, 423
theta series, 426
Law of Quadratic Reciprocity, 219
LDPC, 598
Lee weight, 470
Legendre symbol, 219
lexicode, 97
binary Golay, 99
binary Hamming, 99
line, 230
linear code, 2
linear feedback shift-register, 129, 190
linear feedforward shift-register, 549
Linear Programming Bound, 78
asymptotic, 94
linear shift-register, 129
list-decoding algorithm, 195
low density parity check (LDPC) code, 598
Gallager Hard Decision Decoding Algorithm,
599
Sum-Product Decoding Algorithm, 602
Lucas’ Theorem, 166
MacWilliams equations, 252, 253, 257, 384
MAP, 39
Mariner, 611
mass formula, 365, 386, 498, 499
Massey–Sain Theorem, 570
Mathieu group, 251, 402, 419
matrix, 4
bordered circulant, 31, 376
bordered reverse circulant, 377
circulant, 376
finite weight, 570
generator, 4, 547
standard form of, 4, 21
Gram, 423, 504
Hadamard, see Hadamard matrix
incidence, 291
minor of, 559
monomial, 24
diagonal part of, 24
permutation part of, 24
parity check, 4
permutation, 20
reverse circulant, 32, 377
right inverse, 38
unimodular, 559
Vandermonde, 151
maximum a posteriori probability (MAP) decoding,
39
maximum distance separable (MDS) code, 71
bound, 264
extended GRS, 177
generalized Reed–Solomon, 176
Generalized Singleton Bound, 286
MDS Conjecture, 265
Reed–Solomon, 174
trivial, 72, 264
weight distribution of, 262
maximum likelihood (ML) decoding, 40, 580
McEliece’s Theorem, 157
MDS, 71
MDS Conjecture, 265
meet, 53
Meggitt Decoding Algorithm, 158–160
memory, 546, 548, 553, 561
merge bits, 206
message, 1
message associated with a path, 555
message error, 569
message passing, 594
Message Passing Decoding Algorithm, 595
minimum distance, 8
minimum support weight dr (C), see generalized
Hamming weight
minimum weight, 8
even-like, 15
odd-like, 15
642
Subject index
minor, 559
ML, 40
modulator, 573
monomial automorphism group, 26, 469
monomial matrix, 24
diagonal part of, 24
permutation part of, 24
monomially equivalent codes, 24, 281, 469
MRRW, 93
multiplicity at a point, 529
multiplicity of a root, 195
multiplier, 138, 210, 486
of difference set, 327
splitting of n by, 212
nearest neighbor decoding, 40
nearly perfect binary code, 69
punctured Preparata code, 70
Newton identities, 187
Newton radius, 465
noise, 1, 40, 574
additive white Gaussian, 575
nonlinear code, 53
nonsingular curve, 526
nonsingular point, 526
Nordstrom–Robinson code N 16 , 68
from octacode, 475
optimal, 69
uniqueness of, 69
weight distribution of, 69
norm of a code, 451
norm of a vector, 425
normal code, 452
normalized set, 328
Norse Bounds, 435
Nyquist rate, 578
octad, 300, 429
odd-like code, 12, 210
odd-like vector, 12, 209
optimal code, 53
Nordstrom–Robinson, 69
order ordn (q), 122
order of a group element, 105
orphan, 460
orthogonal code, 5
orthogonal sum, 276
orthogonal vectors x⊥y, 275
oval, 318
overall constraint length, 561
overall parity check, 15
packing radius, 41
parallel concatenated convolutional code (PCCC), 604
parallel concatenated encoder, 604
parent, 358, 460
parity check matrix, 4
partial derivative, 526
PCCC, 604
PD-set, 403
of binary Golay code, 404
perfect code, 48, 49
Golay, 48
Hamming, 48, 49
trivial, 49
permutation automorphism group, 22, 139, 469
Permutation Decoding Algorithm, 402, 403
binary Golay code, 404
PD-set, 403
permutation equivalent codes, 20, 468
permutation matrix, 20
permuter, 604, 607
Peterson–Gorenstein–Zierler Decoding Algorithm,
179, 182
Pioneer, 612
pit, 204
Pless power moments, 256, 259
Pless symmetry code, 420
automorphism group, 421, 422
design held by, 421
extremal, 421
Plotkin Bound, 58
asymptotic, 89
Plücker’s Formula, 532
point, 29, 195, 230, 291, 423, 517, 518
affine, 518
at infinity, 518
degree of, 528
multiplicity at, 529
nonsingular, 526
rational, 528
simple, 526
singular, 526
polynomial, 101
basic irreducible, 476
check, 146, 510
coefficient of, 101
coprime, 477
degree of, 101
derivative, 122
divides, 102
Division Algorithm, 102
divisor of, 102
error evaluator, 190
error locator, 181, 196
Euclidean Algorithm, 102
factor of, 102
generator, 126
generator matrix, 547
Gleason, 342
Goppa, 522
greatest common divisor, 102
643
Subject index
homogeneous, 519
homogenization of, 519
irreducible, 106, 476
Krawtchouck, 75, 256
leading coefficient of, 102
minimal, 112
monic, 102, 476
primary, 476
primitive, 108, 507
reciprocal, 116, 145, 483
regular, 478
relatively prime, 102, 477
reversible, 145
syndrome, 159
term of, 101
degree of, 101
weight enumerator, see weight
enumerator
weighted degree of, 195
Prange’s Theorem, 271
predictable degree property, 559
Preparata code P(r + 1), 515
as a nearly perfect code, 70
weight distribution, 516
primary ideal, 476
prime subfield, 100
primitive element, 104
primitive root of unity, 105, 122
principal ideal, 106, 125
principal ideal domain, 106
probability, 39
a posteriori, 587, 608
a priori, 608
crossover, 39, 583
maximum likelihood, 40
projective closure, 526
projective general linear group, 422
projective geometry, 29, 319
point of, 29, 170
projective line, 518
projective plane, 230, 291, 315, 518
as a symmetric design, 315
code from, 316
cyclic, 230, 321
nonexistence, 324, 326
held by duadic code, 231
line of, 230
of order ten, 329
order of, 230, 291
oval, 318
point of, 230
projective plane curve, 526
genus, 532
projective space, 518
affine point, 518
homogeneous coordinates, 518
point at infinity, 518
projective special linear group, 22, 249, 251, 402
puncture, 13
punctured code, 13
hexacode, 214, 240
Preparata, 70
Reed–Muller, 225
QR, 237
quadratic non-residue, 237
quadratic residue, 237, 323, 335
quadratic residue (QR) code, 237
automorphism group, 248, 251
binary Golay code, 240, 246, 250
existence, 237, 239, 241, 243, 244
extended, 245–248
generating idempotent, 238, 239,
241–244
Gleason–Prange Theorem, 249
Hermitian self-orthogonal, 241
hexacode, 246, 250
minimum weight, 249
over Z4 , 490, 492
punctured hexacode, 240
self-orthogonal, 241, 244
ternary Golay code, 243, 246, 250
quantization, 575
binary, 575
quantum error-correction, 383
quasi-perfect code, 50
quaternary field, 3
quick-look-in convolutional code, 612
radius, 40
covering, see covering radius
Newton, 465
packing, 41
rate, 47, 88, 546
rational point, 528
receiver, 1
recursive systematic convolutional (RSC) encoder, 607
reduced generator matrix, 559
reduction homomorphism, 476
redundancy, 1, 4
Redundancy Bound, 433
redundancy set, 4
Reed–Muller (RM) code R(r, m), 33
as duadic code, 225
covering radius, 437, 438
design held by, 306
dimension of, 34
dual of, 34
from Hadamard matrix, 333
generalized, see generalized Reed–Muller code
minimum weight of, 34
order of, 34
644
Subject index
Reed–Solomon (RS) code, 173, 520, 613
as algebraic geometry code, 536
as MDS code, 174
burst error-correcting, 202
cross-interleaved (CIRC), 204
encoding, 175
extended narrow sense, 176
generalized, see generalized Reed–Solomon code
Peterson–Gorenstein–Zierler Decoding Algorithm,
179, 182
Sudan–Guruswami Decoding Algorithm, 195, 196
Sugiyama Decoding Algorithm, 190, 191
regular curve, 526
relative distance, 89
relative trace function TRr , 508
replicated code, 390
residual code Res(C, c), 80
residue class ring, 107, 121
residue of a function, 524
Restricted Johnson Bound, 61
reverse circulant matrix, 32, 377
Riemann–Roch Theorem, 535
right inverse, 38
ring, 101, 475
commutative, 101
Galois, see Galois ring
ideal of, 106
integral domain, 101
residue class, 107, 121
with unity, 101, 132
RM, 34
root of unity, 105
RS, 173
RSC, 607
Second MRRW Bound, 94
self-complementary code, 435
self-dual code, 6, 26, 227, 229, 230, 233, 234,
245–248, 317, 338, 340, 344, 359, 364,
469
Balance Principle, 351
bound, 344, 346, 495
child, 358, 375
Classification Algorithm, 366
covering radius, 444
design from, 349
Gleason polynomials, 342
Gleason’s Theorem, 341
Hermitian, see Hermitian self-dual code
mass formula, 366
minimum distance, 344–346
number of, 359, 362, 374, 375
parent, 358
shadow, 353, 355, 356
Type I, 339, 427, 495
extremal, 346
Type II, 339, 361, 363, 427, 495
extremal, 346
Type III, 339, 362
extremal, 346
Z4 -, 469, 495
self-orthogonal code, 6, 316, 360, 363, 368,
469
Hermitian, 7
number of, 360, 363
semiring, 584
sextet, 300, 429
shadow, 353, 355, 356
Shannon, Claude, 1
Shannon limit, 577, 603
Shannon’s Theorem, 2, 46, 47, 577
shorten, 16
shortened code, 16
MDS, 202
Reed–Solomon, 202, 204
signal-to-noise ratio, 577
signaling power, 577
simple point, 526
simplex code, 30, 36, 82, 282
covering radius, 439
generalized Hamming weight, 283,
290
meet the Griesmer Bound, 82
weight distribution of, 30, 82
Singleton Bound, 71
asymptotic, 89
generalized, 286
singly-even code, 12
singly-even vector, 275, 277
singular point, 526
smooth curve, 526
soft decision decoding, 573
Soft Decision Viterbi Decoding Algorithm, 580, 581,
612
source, 1
sphere, 40
Sphere Covering Bound, 434
Sphere Packing Bound, 48, 59, 74
splitting, 210
splitting field, 122
splitting of n, 212
Square Root Bound, 230
state diagram, 551, 552
state of an encoder, 551–553
Stirling numbers S(r, ν), 256
Stirling’s Formula, 90
strength, 435
subcode, 5
subfield subcode C|Fq , 116
dual of, 119
of a cyclic code, 128
parity check matrix, 117
645
Subject index
Sudan–Guruswami Decoding Algorithm, 195, 196
Sugiyama Decoding Algorithm, 190, 191
Sum-Product Decoding Algorithm, 602
Supercode Lemma, 434
support supp(D) of a code, 283
support supp(c) of a vector, 120
survivor path, 555
symmetry code, see Pless symmetry code
syndrome, 42, 51, 179
Syndrome Decoding Algorithm, 42, 43
systematic encoder, 37, 607
systematic generator matrix, 607
systematic term, 609
Tanner graph, 593
check node, 594
Message Passing Decoding Algorithm, 595
variable node, 594
t-design, see design
tensor product, 332
ternary field, 3
t-error-correcting code, 41
tetracode H3,2 , 6
as Hamming code, 30
automorphism group of, 26
tetrad, 299, 429
theorem
Assmus–Mattson, 303
Bruck–Ryser–Chowla, 319
Bézout, 531
Delsarte, 119
Gleason, 341
Gleason–Pierce–Ward, 339
Gleason–Prange, 249
Hensel’s Lemma, 477
Law of Quadratic Reciprocity, 219
Lucas, 166
Massey–Sain, 570
McEliece, 157
Plücker’s Formula, 532
Prange, 271
Riemann–Roch, 535
Shannon, 2, 46, 47, 577
Supercode Lemma, 434
trace code Trt (C), 119
dual of, 119
trace function Trt , 119
transitive, 23
trellis diagram, 554
flow along a path, 584
flow between vertices, 584
truncated trellis, 555
turbo code, 602
decoding, 607
encoding, 604
interleaver, 604
parallel concatenated convolutional code, 604
parallel concatenated encoder, 604
permuter, 604, 607
Turbo Decoding Algorithm, 610
Turbo Decoding Algorithm, 610
two-sided power spectral density, 575
Two-Way a Posteriori Probability (APP) Decoding
Algorithm, 587, 592
Type I code, 339, 385, 495
extremal, 346, 386
Type II code, 339, 361, 363, 385, 495
extremal, 346, 386
Type III code, 339, 362
extremal, 346
Type IV code, 339, 362
extremal, 346
(u | u + v) construction, 19
unimodular matrix, 559
unique factorization domain, 106
unit, 217
unity, 132
Unrestricted Johnson Bound, 63
van Lint–Wilson Bounding Technique, 154
Vandermonde matrix, 151
Varshamov Bound, 88
asymptotic, 94, 541
vector, 1
ρ-covered, 432
complement of, 333
cover, 459
doubly-even, 275, 277
error, 1, 40, 179
even-like, 12, 209
glue, 370
odd-like, 12, 209
received, 1, 41
singly-even, 275, 277
support of, 120
syndrome of, 42, 179
trace of, 119
Viking, 611
Viterbi Decoding Algorithm, 551, 556
Voyager, 613
weight, 8, 470, 562
coset, 41
distribution, see weight distribution
enumerator, see weight enumerator
Euclidean, 470
generalized Hamming, see generalized Hamming
weight
Hamming, 8, 470
hierarchy, 283, 284
Lee, 470
646
Subject index
weight, (cont.)
minimum, 8
minimum even-like, 15
minimum odd-like, 15
minimum support, see generalized Hamming
weight
spectrum, see weight distribution
weight distribution, 252
MacWilliams equations, 252, 253
Pless power moments, 256, 259
weight enumerator WC (x) or WC (x, y), 255, 257
complete, 273
Lee, 274
weight hierarchy, 283, 284
weight of a path, 555
weight of an edge, 555
weight preserving linear transformation,
279
weighted degree, 195
word error rate, 46
Z4 -linear code, 467
complete weight enumerator, 471
cyclic, see cyclic code over Z4
distance invariant, 472
dual, 469
Euclidean distance, 470
Euclidean weight, 470
generator matrix, 468
standard form of, 469
Gray map, 472
Hamming distance, 470
Hamming distance distribution, 472
Hamming weight, 470
Hamming weight enumerator, 471
Kerdock, see Kerdock code
lattice, see lattice
Lee distance, 470
Lee weight, 470
Lee weight enumerator, 471
monomial automorphism group, 469
monomially equivalent, 469
octacode, 474
Gosset lattice from, 505
Nordstrom–Robinson code from,
475
permutation automorphism group, 469
permutation equivalent, 468
Preparata, see Preparata code
residue, 496
self-dual, 469
cyclic, see cyclic code over Z4
Euclidean-extremal, 495
generator matrix, 497
mass formula, 498, 499
number of, 498, 499, 501
Type I, 495, 504
Type II, 495, 504
self-orthogonal, 469
symmetrized weight enumerator, 471
torsion, 496
type, 468