Coding theory originated with the advent of computers. Early computers were huge mechanical monsters whose reliability was low compared to the computers of today. Based, as they were, on banks of mechanical relays, if a single relay failed to close the entire calculation was in error. The engineers of the day devised ways to detect faulty relays so that they could be replaced. While R.W. Hamming was working for Bell Labs, out of frustration with the behemoth he was working with, the thought occurred that if the machine was capable of knowing it was in error, wasn't it also possible for the machine to correct that error. Setting to work on this problem Hamming devised a way of encoding information so that if an error was detected it could also be corrected. Based in part on this work, Claude Shannon developed the theoretical framework for the science of coding theory.

What we have called Coding Theory, should more properly be called the Theory of Error-Correcting Codes, since there is another aspect of Coding Theory which is older and deals with the creation and decoding of secret messages. This field is called Cryptography and we will not be interested in it. Rather, the problem that we wish to address deals with the difficulties inherent with the transmission of messages. More particularly, suppose that we wished to transmit a message and knew that in the process of transmission there would be some altering of the message, due to weak signals, sporadic electrical bursts and other naturally occuring noise that creeps into the transmission medium. The problem is to insure that the intended message is obtainable from whatever is actually received. One simple approach to this problem is what is called a repeat code. For instance, if we wanted to send the message BAD NEWS, we could repeat each letter a certain number of times and send, say,

Even if a number of these letters got garbled in transmission, the intended message could be recovered from a received message that might look like
by a process called majority decoding, which in this case would mean that for each block of 5 letters the intended letter is the one which appears most frequently in the block. The problem with this approach is economics, the repeat code is not very efficient. The increased length of the transmitted code, and thus the increased time and energy required to transmit it, is necessary in order to be able to decode the message properly, but how efficiently a coding proceedure uses this increase depends upon the coding scheme. Suppose, in our example, that the probability that a letter is garbled in transmission is p = 0.05 and so q = 1 - p = 0.95 is the probability that a letter is correctly received. Without any coding, the probability of our 8 letter (spaces included) message being correctly received is q8 = 0.66. Using the repeat code, the probability of correctly decoding a given letter is q5 + 5q4p + 10q3p2 = 0.9988 and so the probability of getting the correct message after decoding is (0.9988)8 = 0.990, clearly a great increase over the non-coded message, but this 1% probability of getting the wrong message might not be acceptable for certain applications. To increase the probability of decoding the correct message with this type of code we would have to increase the number of repeats - a fix which may not be desirable or even possible in certain situations. However, as we shall see, other coding schemes could increase the probability to 0.9999 without increasing the length of the coded message.

Before leaving the repeat codes to look at other coding schemes, let us introduce some terminology. Each block of repeated symbols is called a Code word, i.e., a code word is what is transmitted in place of one piece of information in the original message. The set of all code words is called a Code. If all the code words in a code have the same length, then the code is called a Block code. The repeat codes are block codes. One feature that a useful code must have is the ability to detect errors. The repeat code with code words having length 5 can always detect from 1 to 4 errors made in the transmission of a code word, since any 5 letter word composed of more than one letter is not a code word. However, it is possible for 5 errors to go undetected (how?). We would say that this code is 4-error detecting. Another feature is the ability to correct errors, i.e., being able to decode the correct information from the error riddled received words. The repeat code we are dealing with can always correct 1 or 2 errors, but may decode a word with 3 or more errors incorrectly, so it is a 2-error correcting code.


Suppose that you knew that an English word was transmitted and you had received the word SHIP. If you suspected that some errors had occurred in transmission, it would be impossible to determine what word was really transmitted - it could have been SKIP, SHOP, STOP, THIS, actually any four letter word. The problem here is that English words are in a sense "too close" to each other. What gives a code its error correcting ability is the fact that the code words are "far apart". We shall make this distance idea more precise in a moment.

First of all we shall restrict our horizons and only consider block codes, so all code words will have the same length. Secondly, we will assume that the alphabet used to create our code words consists only of 0 and 1. This last restriction is not as limiting as it appears, after all a computer's word handling abilities rest ultimately on strings of 0's and 1's. We are concerned then with binary block codes. The words (code words and others) that we are dealling with are thus ordered n-tuples of 0's and 1's, where n is the length of the words. These can be viewed abstractly as elements of an n-dimensional vector space over GF(2).

The Hamming distance between two words is the number of places in which they differ. So, for example, the words (0,0,1,1,1,0) and (1,0,1,1,0,0) would have a Hamming distance of 2. This Hamming distance is a metric on the vector space, i.e., if d(x,y) denotes the Hamming distance between vectors x and y, then d satisfies:

[prove this]. Since we will only deal with the Hamming distance (there are other metrics used in Coding Theory), we will generally omit the Hamming and talk about the distance between words.

The minimum distance of a code C is the smallest distance between any pair of distinct code words in C (assuming that C is finite). It is the minimum distance of a code that measures a code's error correcting capabilities. If the minimum distance of a code C is 2e + 1, then C is an e-error correcting code, since if e or fewer errors are made in a code word, the resulting word is closer to the original code word than it is to any other code word and so can be correctly decoded.

The weight of a word is the number of non-zero components in the vector. Alternatively, the weight is the distance of the word from the zero vector. Examining the weights of the code words sometimes gives useful information about a particular code.

An important class of codes are the linear codes, these codes are the ones whose code words form a sub-vector space. If the vector space of all words is n dimensional and the subspace is k dimensional then we talk about the subspace as an (n,k)-linear code.

In general, finding the minimum distance of a code requires comparing every pair of distinct elements. For a linear code however this is not necessary.

Proposition VI.1.1 - In a linear code the minimum distance is equal to the minimal weight among all non-zero code words.

Proof: Let x and y be code words in the code C, then x - y is in C since C is linear. We then have d(x,y) = d(x-y,0) which is the weight of x-y.

We shall now look at two ways of describing a linear code C. The first is given by a generator matrix G which has as its rows a set of basis vectors of the linear subspace C. Since the property we are most interested in is possible error-correction and this property is not changed if in all code words we interchange two symbols (e.g. the first and second letter of each code word) we shall call two codes equivalent if one can be obtained by applying a fixed permutation of symbols to the words of the other code. With this in mind we see that for every linear code there is an equivalent code which has a generator matrix of the form G = [Ik P], where Ik is the k by k identity matrix and P is a k by n-k matrix. We call this the standard form of G.

We now come to the second description of a linear code C. The orthogonal complement of C, i.e. the set of all vectors which are orthogonal to every vector in C [orthogonal = inner product is 0], is a subspace and thus another code called the dual code of C, and denoted by C' . If H is a generator matrix for C' then H is called a parity check matrix for C. In general a parity check for the code C is a vector x which is orthogonal to all code words of C and we shall call any matrix H a parity check matrix if the rows of H generate the dual code of C. Therefore, a code C is defined by such a parity check matrix H as follows:

C = { x | xHt = 0 }.

Let us consider an example. Let C be the (7,4)-linear code generated by the rows of G:

                      1  0  0  0  1  1  0 
               G =    0  1  0  0  0  1  1 
                      0  0  1  0  1  1  1
                      0  0  0  1  1  0  1
We get the 16 code words by multiplying G on the left by the 16 different column vectors of length 4 over GF(2). They are:
0 0 0 0 0 0 0      1 1 0 1 0 0 0      0 1 1 0 1 0 0     0 0 1 1 0 1 0
0 0 0 1 1 0 1      1 0 0 0 1 1 0      0 1 0 0 0 1 1     1 0 1 0 0 0 1
1 1 1 1 1 1 1      0 0 1 0 1 1 1      1 0 0 1 0 1 1     1 1 0 0 1 0 1
1 1 1 0 0 1 0      0 1 1 1 0 0 1      1 0 1 1 1 0 0     0 1 0 1 1 1 0 
Notice that there are 7 code words of weight 3, 7 of weight 4, 1 of weight 7 and 1 of weight 0. Since this is a linear code, the minimum distance of this code is 3 and so it is a 1-error correcting code.

A parity check matrix for this code is given by

               1  0  1  1  1  0  0
       H  =    1  1  1  0  0  1  0
               0  1  1  1  0  0  1
[Verify this].

This code is generally known as the (7,4)-Hamming Code being one of a series of linear codes due to Hamming and Golay.


Let H be an Hadamard matrix of order 4m. Take the rows of H and the rows of -H and change all the -1 entries to 0's. This gives us 8m vectors of length 4m over GF(2). Now by the properties of Hadamard matrices the distance between any two distinct vectors is either 2m or 4m. [Verify] So, considering these rows as forming a code, the minimum distance is 2m and the code will be (m-1)-error correcting. Codes formed in this way are called Hadamard Codes or, if the Hadamard matrix is formed by taking direct products of the order 2 Hadamard matrix, Reed-Muller Codes of the first kind.

To examine the process of using codes we shall look at a real application. The Mariner 9 was a space probe whose mission was to fly by Mars and transmit pictures back to Earth. The black and white camera aboard the Mariner 9 took the pictures, and a fine grid was then placed over the picture and for each square of the grid the degree of blackness is measured on a scale from 0 to 63. These numbers, expressed in binary are the data that is transmitted to Earth (more precisely to the Jet Propulsion Laboratory of the California Institute of Technology in Pasadena). On arrival the signal is very weak and it must be amplified. Noise from space added to the signal and thermal noise from the amplifier have the effect that it happens occasionally that a signal trasmitted as a 1 is interpreted by the receiver as a 0 and vice versa. If the probability that this occurs is 0.05 then by the calculation done in the introduction, if no coding were done approximately 26% of the picture received would be incorrect. Thus, there is clearly a need to code this information with an error correcting code. Now the question is, what code should be used? Any code will increase the size of the data being sent and this creates a problem. The Mariner 9 is a small vehicle and can not carry a huge transmitter, so the transmitted signal had to be directional, but over the long distances involved a directional signal has alignment problems. So, there was a maximum size to how much data could be transmitted at one time (while the transmitter was aligned). This turned out to be about 5 times the size of the original data, so since the data consisted of 6 bits (0,1 - vectors of length 6) the code words could be about 30 bits long. The 5-repeat code was a possibility, having the advantage that it is very easy to implement, but it is only 2-error correcting. An Hadamard code based on an Hadamard matrix of order 32 on the other hand would be 7-error correcting and so worth the added difficulty of implementing it. Using this code, the probability of error in the picture is reduced to only 0.01% (the 5-repeat code would have a probability of error of about 1%).

We now turn our attention to the problems of coding and decoding using an Hadamard code. At first glance, coding doesn't seem to be a problem, after all there are 64 data types and 64 code words - so any arbitrary assignment of data type to code word will work. The problem lies in the fact that the Mariner 9 is small, and this approach would require storing all 64 32-bit code words. It turns out to be more economical, in terms of space and weight, to design hardware that will actually calculate the code words rather than read them out of a stored array. By choosing the Hadamard matrix correctly, the Hadamard code will turn out to be a linear code and so this calculation is simply multiplying the data by the generator matrix of the code. The correct choice for the Hadamard matrix is the one obtained by repeatedly taking the direct product of the order 2 Hadamard matrix. [Prove that such an Hadamard code is linear by induction].

Now consider the decoding problem. A simple scheme for decoding is as follows: A received signal, i.e. a sequence of 32 zeros and ones, is first changed into its ±1 form (by changing each 0 to -1). If the result is the vector x and if there are no errors, then xHt, where H is the original Hadamard matrix, will be a vector with 31 components equal to 0 and one component equal to either ±32. In the presence of errors these numbers are changed, but if the number of errors is at most 7 then the values 0 can increase to at most 14 and the value 32 can decrease to no less than 18.Thus the maximal entry in xHt will tell us which row of H or -H (if it is negative) was transmitted. While this is the actual algorithm used to decode the Mariner 9 signals, it is a bit slow from the computational point of view (requiring 322 multiplications and the corresponding additions for each code word), so a number of computational tricks are employed to reduce the actual computation to less than 1/3 of what the algorithm calls for.


There are many texts devoted to Coding Theory, a few with special significance are:

E.R. Berlekamp, Algebraic Coding Theory, McGraw-Hill, N.Y. 1968.

I.F. Blake and R.C.Mullin,An Introduction to Algebraic and Combinatorial Coding Theory, Academic Press, N.Y., 1976.

P.J. Cameron and J.H. Van Lint, Graph Theory, Coding Theory and Block Designs, Cambridge University Press, Cambridge, 1975.

W.W. Peterson and E.J. Weldon,Jr., Error-Correcting Codes, MIT Press, Cambridge, 1972.

V.Pless, Introduction to the Theory of Error-Correcting Codes, Wiley, New York, 1982.

F.J. MacWilliams and N.J.A. Sloane, The Theory of Error-Correcting Codes, North Holland, Amsterdam, 1977.

The Mariner 9 mission and the coding theory used in that project are the subjects of,

J.H. Van Lint, "Coding, decoding and Combinatorics", in Applications of Combinatorics, ed. R.J. Wilson, Shiva, Cheshire, 1982.

E.C. Posner, "Combinatorial Structures in Planetary Reconnaissance" in Error Correcting Codes, ed. H.B. Mann, Wiley, N.Y. 1968.

Return to M4410 <a href="m4410.html">homepage</a> (non-frame version). <hr>