Coding theory originated with the advent of computers. Early computers were huge mechanical monsters whose reliability was low compared to the computers of today. Based, as they were, on banks of mechanical relays, if a single relay failed to close the entire calculation was in error. The engineers of the day devised ways to detect faulty relays so that they could be replaced. While R.W. Hamming was working for Bell Labs, out of frustration with the behemoth he was working with, the thought occurred that if the machine was capable of knowing it was in error, wasn't it also possible for the machine to correct that error. Setting to work on this problem Hamming devised a way of encoding information so that if an error was detected it could also be corrected. Based in part on this work, Claude Shannon developed the theoretical framework for the science of coding theory.

What we have called Coding Theory, should more properly be called the Theory of Error-Correcting Codes, since there is another aspect of Coding Theory which is older and deals with the creation and decoding of secret messages. This field is called Cryptography and we will not be interested in it. Rather, the problem that we wish to address deals with the difficulties inherent with the transmission of messages. More particularly, suppose that we wished to transmit a message and knew that in the process of transmission there would be some altering of the message, due to weak signals, sporadic electrical bursts and other naturally occurring noise that creeps into the transmission medium. The problem is to insure that the intended message is obtainable from whatever is actually received. One simple approach to this problem is what is called a repeat code. For instance, if we wanted to send the message BAD NEWS, we could repeat each letter a certain number of times and send, say, BBBBBAAAAADDDDD      NNNNNEEEEEWWWWWSSSSS. Even if a number of these letters got garbled in transmission, the intended message could be recovered from a received message that might look like BBBEBFAAAADGDDD   .   MNNNTEEEEE-WWWSWRRSSS, by a process called majority decoding, which in this case would mean that for each block of 5 letters the intended letter is the one which appears most frequently in the block. The problem with this approach is economics, the repeat code is not very efficient. The increased length of the transmitted code, and thus the increased time and energy required to transmit it, is necessary in order to be able to decode the message properly, but how efficiently a coding procedure uses this increase depends upon the coding scheme. Suppose, in our example, that the probability that a letter is garbled in transmission is p = 0.05 and so q = 1 - p = 0.95 is the probability that a letter is correctly received. Without any coding, the probability of our 8 letter (spaces included) message being correctly received is q8 = 0.66. Using the repeat code, the probability of correctly decoding a given letter is q5 + 5q4p + 10q3p2 = 0.9988 and so the probability of getting the correct message after decoding is (0.9988)8 = 0.990, clearly a great increase over the non-coded message, but this 1% probability of getting the wrong message might not be acceptable for certain applications. To increase the probability of decoding the correct message with this type of code we would have to increase the number of repeats - a fix which may not be desirable or even possible in certain situations. However, as we shall see, other coding schemes could increase the probability to 0.9999 without increasing the length of the coded message.

Before leaving the repeat codes to look at other coding schemes, let us introduce some terminology. Each block of repeated symbols is called a Code word, i.e., a code word is what is transmitted in place of one piece of information in the original message. The set of all code words is called a Code. If all the code words in a code have the same length, then the code is called a Block code. The repeat codes are block codes. One feature that a useful code must have is the ability to detect errors. The repeat code with code words having length 5 can always detect from 1 to 4 errors made in the transmission of a code word, since any 5 letter word composed of more than one letter is not a code word. However, it is possible for 5 errors to go undetected (how?). We would say that this code is 4-error detecting. Another feature is the ability to correct errors, i.e., being able to decode the correct information from the error riddled received words. The repeat code we are dealing with can always correct 1 or 2 errors, but may decode a word with 3 or more errors incorrectly, so it is a 2-error correcting code.


Suppose that you knew that an English word was transmitted and you had received the word SHIP. If you suspected that some errors had occurred in transmission, it would be impossible to determine what word was really transmitted - it could have been SKIP, SHOP, STOP, THIS, actually any four letter word. The problem here is that English words are in a sense "too close" to each other. What gives a code its error correcting ability is the fact that the code words are "far apart". We shall make this distance idea more precise in a moment.

First of all we shall restrict our horizons and only consider block codes, so all code words will have the same length. Secondly, we will assume that the alphabet used to create our code words consists only of 0 and 1. This last restriction is not as limiting as it appears, after all a computer's word handling abilities rest ultimately on strings of 0's and 1's. We are concerned then with binary block codes. The words (code words and others) that we are dealing with are thus ordered n-tuples of 0's and 1's, where n is the length of the words. These can be viewed abstractly as elements of an n-dimensional vector space over GF(2).

The Hamming distance between two words is the number of places in which they differ. So, for example, the words (0,0,1,1,1,0) and (1,0,1,1,0,0) would have a Hamming distance of 2. This Hamming distance is a metric on the vector space, i.e., if (x,y) denotes the Hamming distance between vectors x and y, then satisfies:

  1. (x,x) = 0
  2. (x,y) = (y,x), and
  3. (x,y) + (y,z) (x,z)
[prove this].

Since we will only deal with the Hamming distance (there are other metrics used in Coding Theory), we will generally omit the Hamming and talk about the distance between words.

The minimum distance of a code C is the smallest distance between any pair of distinct code words in C (assuming that C is finite). It is the minimum distance of a code that measures a code's error correcting capabilities. If the minimum distance of a code C is 2e + 1, then C is an e-error correcting code, since if e or fewer errors are made in a code word, the resulting word is closer to the original code word than it is to any other code word and so can be correctly decoded.

The weight of a word is the number of non-zero components in the vector. Alternatively, the weight is the distance of the word from the zero vector. Examining the weights of the code words sometimes gives useful information about a particular code.

An important class of codes are the linear codes, these codes are the ones whose code words form a sub-vector space. If the vector space of all words is n dimensional and the subspace is k dimensional then we talk about the subspace as an (n,k)-linear code.

In general, finding the minimum distance of a code requires comparing every pair of distinct elements. For a linear code however this is not necessary.

Proposition VI.1.1 - In a linear code the minimum distance is equal to the minimal weight among all non-zero code words.

Proof: Let x and y be code words in the code C, then x - yC since C is linear. We then have (x,y) = (x-y,0) which is the weight of x-y.

We shall now look at two ways of describing a linear code C. The first is given by a generator matrix G which has as its rows a set of basis vectors of the linear subspace C. Since the property we are most interested in is possible error-correction and this property is not changed if in all code words we interchange two symbols (e.g. the first and second letter of each code word) we shall call two codes equivalent if one can be obtained by applying a fixed permutation of symbols to the words of the other code. With this in mind we see that for every linear code there is an equivalent code which has a generator matrix of the form G = [Ik P], where Ik is the k× k identity matrix and P is a k by n-k matrix. We call this the standard form of G.

We now come to the second description of a linear code C. The orthogonal complement of C, i.e. the set of all vectors which are orthogonal to every vector in C [orthogonal = inner product is 0], is a subspace and thus another code called the dual code of C, and denoted by Cd . If H is a generator matrix for Cd then H is called a parity check matrix for C. In general a parity check for the code C is a vector x which is orthogonal to all code words of C and we shall call any matrix H a parity check matrix if the rows of H generate the dual code of C. Therefore, a code C is defined by such a parity check matrix H as follows:

C = { x | Hxt = 0 }.
Let us consider an example. Let C be the (7,4)-linear code generated by the rows of G:
G =
We get the 16 code words by multiplying G on the left by the 16 different column vectors of length 4 over GF(2). They are:
0 0 0 0 0 0 0      1 1 0 1 0 0 0      0 1 1 0 1 0 0     0 0 1 1 0 1 0
0 0 0 1 1 0 1      1 0 0 0 1 1 0      0 1 0 0 0 1 1     1 0 1 0 0 0 1
1 1 1 1 1 1 1      0 0 1 0 1 1 1      1 0 0 1 0 1 1     1 1 0 0 1 0 1
1 1 1 0 0 1 0      0 1 1 1 0 0 1      1 0 1 1 1 0 0     0 1 0 1 1 1 0 
Notice that there are 7 code words of weight 3, 7 of weight 4, 1 of weight 7 and 1 of weight 0. Since this is a linear code, the minimum distance of this code is 3 and so it is a 1-error correcting code.

A parity check matrix for this code is given by

H =
[Verify this].

This code is generally known as the (7,4)-Hamming Code being one of a series of linear codes due to Hamming and Golay.


Let H be an Hadamard matrix of order 4m. Take the rows of H and the rows of -H and change all the -1 entries to 0's. This gives us 8m vectors of length 4m over GF(2). Now by the properties of Hadamard matrices the distance between any two distinct vectors is either 2m or 4m. [Verify] So, considering these rows as forming a code, the minimum distance is 2m and the code will be (m-1)-error correcting. Codes formed in this way are called Hadamard Codes or, if the Hadamard matrix is formed by taking direct products of the order 2 Hadamard matrix, Reed-Muller Codes of the first kind.

To examine the process of using codes we shall look at a real application. The Mariner 9 was a space probe whose mission was to fly by Mars and transmit pictures back to Earth. The black and white camera aboard the Mariner 9 took the pictures, and a fine grid was then placed over the picture and for each square of the grid the degree of blackness is measured on a scale from 0 to 63. These numbers, expressed in binary are the data that is transmitted to Earth (more precisely to the Jet Propulsion Laboratory of the California Institute of Technology in Pasadena). On arrival the signal is very weak and it must be amplified. Noise from space added to the signal and thermal noise from the amplifier have the effect that it happens occasionally that a signal transmitted as a 1 is interpreted by the receiver as a 0 and vice versa. If the probability that this occurs is 0.05 then by the calculation done in the introduction, if no coding were done approximately 26% of the picture received would be incorrect. Thus, there is clearly a need to code this information with an error correcting code. Now the question is, what code should be used? Any code will increase the size of the data being sent and this creates a problem. The Mariner 9 is a small vehicle and can not carry a huge transmitter, so the transmitted signal had to be directional, but over the long distances involved a directional signal has alignment problems. So, there was a maximum size to how much data could be transmitted at one time (while the transmitter was aligned). This turned out to be about 5 times the size of the original data, so since the data consisted of 6 bits (0,1 - vectors of length 6) the code words could be about 30 bits long. The 5-repeat code was a possibility, having the advantage that it is very easy to implement, but it is only 2-error correcting. An Hadamard code based on an Hadamard matrix of order 32 on the other hand would be 7-error correcting and so worth the added difficulty of implementing it. Using this code, the probability of error in the picture is reduced to only 0.01% (the 5-repeat code would have a probability of error of about 1%).

We now turn our attention to the problems of coding and decoding using an Hadamard code. At first glance, coding doesn't seem to be a problem, after all there are 64 data types and 64 code words - so any arbitrary assignment of data type to code word will work. The problem lies in the fact that the Mariner 9 is small, and this approach would require storing all 64 32-bit code words. It turns out to be more economical, in terms of space and weight, to design hardware that will actually calculate the code words rather than read them out of a stored array. By choosing the Hadamard matrix correctly, the Hadamard code will turn out to be a linear code and so this calculation is simply multiplying the data by the generator matrix of the code. The correct choice for the Hadamard matrix is the one obtained by repeatedly taking the direct product of the order 2 Hadamard matrix. [Prove that such an Hadamard code is linear by induction].

Now consider the decoding problem. A simple scheme for decoding is as follows: A received signal, i.e. a sequence of 32 zeros and ones, is first changed into its ±1 form (by changing each 0 to -1). If the result is the vector x and if there are no errors, then xHt, where H is the original Hadamard matrix, will be a vector with 31 components equal to 0 and one component equal to either ±32. In the presence of errors these numbers are changed, but if the number of errors is at most 7 then the values 0 can increase to at most 14 and the value 32 can decrease to no less than 18.Thus the maximal entry in Hxt will tell us which row of H or -H (if it is negative) was transmitted. While this is the actual algorithm used to decode the Mariner 9 signals, it is a bit slow from the computational point of view (requiring 322 multiplications and the corresponding additions for each code word), so a number of computational tricks are employed to reduce the actual computation to less than 1/3 of what the algorithm calls for.


Latin squares have been used in many ways to produce codes of various types. We shall look at a few examples, using both single squares and sets of MOLS.

Elspas, Minnick and Short in 1963 proposed the use of unipotent symmetric Latin squares to design single error correction codes where the code words have weight 2. A unipotent Latin square is one in which the elements on the main left to right diagonal are all the same. Suppose that the information we wish to transmit consists of n-bit binary words of weight 2. There are n(n-1)/2 such words. The code words are formed by adding a number of check bits to the ends of the words so that the words with a 1 in the same position receive different check bits. This can be done in the following way: Suppose that there is a unipotent symmetric Latin square of order n, say L, using the symbols 1,2,...,n, where n is the symbol on the main diagonal. If a word has 1's in positions i and j, then the check bits added to this word will be the binary representation of the (i,j)-th entry in L. So, for example with n = 6 we have the unipotent symmetric Latin square,

L =

Code Words
info bits
check bits

For example, the first word has 1's in the first and second positions, to obtain the appropriate check bits look in L in the (1,2) position, it contains 2 which in binary is 010. To see how this code works consider the received signal 0 1 1 1 0 0 1 0 0, since the first 6 bits contain three ones we know that an error has been committed. Assuming that only one error is present we know that one of these three 1's is incorrect, but which one? The three possible words that could have been transmitted all have different check bits, and under our assumption we know that the check bit portion of the received word is correct, so this must have been 0 1 1 0 0 0 1 0 0. The use of the Latin square insures that if only one error occurs, the check bits of the possible code words are distinct. [Verify this].

Since this coding scheme depends upon the existence of a unipotent symmetric Latin square, the following theorem is of interest.

Theorem VI.3.1 - There exist unipotent symmetric Latin squares of order n for every even integer n, but none exists if n is any odd order greater than 1.

Proof: [Exercise]

In 1963, G.B. Olderogge proposed a coding scheme based on a pair of orthogonal mates. His technique produced 2-error correcting codes with code words having length n2 + 4n + 1 containing n2 information bits, by using pairs of orthogonal mates of order n (nnot equal6). We will demonstrate this method by an example with n = 5.

Suppose that the information to be transmitted is the sequence -

0 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 1 0 1 0
We divide the sequence into blocks of length 5 and stack them to form a 5×5 matrix of 0's and 1's. This matrix is then bordered with parity check bits, that is, we find the digit which will make the sum of the column or row equal to 0 mod 2. This gives us matrix K:

K =

Next we construct an auxiliary 5×5 matrix M whose entries are the ordered pairs of a superimposed pair of orthogonal mates of order 5. One such is:

M =

Now, using K and M we construct another 5×5 matrix L of 0's and 1's by filling the (i,j)th cell of L with the element in K which is in the position given by the ordered pair in the (i,j)th position of M. Thus the (4,2) entry in L is 1 since the (3,1) entry of K is 1. L is then bordered with parity check bits as K was. The result is:

L =

The code word is now formed by the 25 message bits followed by the row check bits of K, the column check bits of K, the row check bits of L, the column check bits of L and one parity check bit for all the preceding bits. Thus, the code word of this example is:

Under the assumption that no more than two errors have occurred we can decode the message as follows: The last bit will tell us how many errors have occurred, if it equals the weight mod 2 of this vector then 0 or 2 errors have occurred, otherwise only 1 error has occurred. We shall analyze the case where 2 errors are committed and leave the 1 error case as an exercise. The matrices K and L are again formed with the received word and their check bits are recalculated and compared with the transmitted check bits. If all are in agreement then no error has been committed. Suppose that two errors have been made in the body of the message, in different rows and columns of K. In this case, two row and two column check bits of K will not agree with their transmitted counterparts. This gives us four cells in K in which the two errors are located. To find the errors we examine the check bits of L. So, if the received word (for the previous code word) gives us this K,


we know that two of the cells (1,2),(1,3),(3,2) or (3,3) are in error. Forming the L matrix and computing its check bits gives:


which indicates errors in cells (1,3),(1,4),(2,3) or (2,4) of L, but these are (using M) the cells (3,3),(4,4),(5,1) and (1,2) of K. So the two cells in error must be (3,3) and (1,2). If the two errors occur in the same row or column of K, then there will be two errors in the check bits of the columns (rows) of K and none in the rows (columns). L can again be used to find where the errors are. If one or more errors occur amongst the check bits themselves, they can easily be detected and corrected [verify].

We will now consider a non-binary code, that is a code that is based on an alphabet larger than { 0,1 }. If there are q symbols in the alphabet then we talk about a q-ary code. An (n,w,d) q-ary code is a block code whose code words have length n, with w code words of minimal distance d, over an alphabet of q elements. The following theorem brings to light a connection between certain codes and sets of MOLS.

Theorem VI.3.2 - An (n,q2,n-1) q-ary code is equivalent to a set of n-2 MOLS of order q.

Proof: We first show how to construct the code from the set of MOLS. We will assume that the q symbols used in the Latin squares are {1,2,...,q}. Let the Latin squares be denoted by L1, L2, L3,..., Ln-2. The code words are the n-tuples, ( i,j,aij1 ,aij2 ,....,aijn-2 ) where 1i,j q and aijk is the (i,j)th entry in the square Lk . Clearly, there are q2 code words of length n in this code. Consider two different code words. They can agree in at most one place in the last n-2 places since if they agreed in two places, the corresponding Latin squares when superimposed would contain an ordered pair twice, contrary to the orthogonality of the squares. If the two code words agreed in either the first or second position then they can agree in none of the last n-2 places, for if they did this would imply that some square had the same symbol twice in a row or column, contradicting the fact that they are Latin squares. Thus, the minimal distance between code words is n-1.

On the other hand, if we are given such a code, it is easy to see that reversing the procedure will produce n-2 MOLS. The pairwise orthogonality will follow from the distance properties of the code.

These Latin square codes have a theoretical importance. It can be shown that in an (n,w,d) q-ary code,

w qn - d + 1 .
This is known as the Joshibound, i.e. the maximum number of code words for given q,n and d. This bound is not always attained, but it can be obtained with some of the Latin square codes when q is an order for which a complete set of MOLS exists.


There are many texts devoted to Coding Theory, a few with special significance are:

E.R. Berlekamp, Algebraic Coding Theory, McGraw-Hill, N.Y. 1968.

W.W. Peterson and E.J. Weldon,Jr., Error-Correcting Codes, MIT Press, Cambridge, 1972.

V.Pless, Introduction to the Theory of Error-Correcting Codes, Wiley, New York, 1982.

F.J. MacWilliams and N.J.A. Sloane, The Theory of Error-Correcting Codes, North Holland, Amsterdam, 1977.

I.F. Blake and R.C.Mullin, An Introduction to Algebraic and Combinatorial Coding Theory, Academic Press, N.Y., 1976.

One monograph of special mention, as it relates to the next chapter, is,

P.J. Cameron and J.H. Van Lint, Graph Theory, Coding Theory and Block Designs, Cambridge University Press, Cambridge, 1975 or their 1991 edition which was renamed (and revised) as Designs, Graphs, Codes and their Links.

The Mariner 9 mission and the coding theory used in that project are the subjects of,

J.H. Van Lint, "Coding, decoding and Combinatorics", in Applications of Combinatorics, ed. R.J. Wilson, Shiva, Cheshire, 1982.

E.C. Posner, "Combinatorial Structures in Planetary Reconnaissance" in Error Correcting Codes, ed. H.B. Mann, Wiley, N.Y. 1968.