Linear Feedback Shift Registers

The key distribution problem for One-Time Pad suggests that one might use an algorithm to generate the random sequence needed as the key (transfer of only a short seed would then be needed).

However, no algorithm using a finite state machine can produce a truly random sequence, since the finiteness forces the sequence to be periodic. The best we can do is use very long period sequences, called pseudo-random sequences.

What properties should a pseudo-random sequence have to make it look like a random sequence?

Golomb's Principles

  • G1: The number of zeros and ones should be as equal as possible per period.
  • G2: Half the runs in a period have length 1, one-quarter have length 2, ... , 1/2i have length i. Moreover, for any length, half the runs are blocks and the other half gaps. (A block is a subsequence of the form ...011110... and a gap is one of the form ...10000001...., either type is called a run.)
  • G3: The out-of-phase autocorrelation AC(k) has the same value for all k.
  • AC(k) = (Agreements - Disagreements)/p where we are comparing a sequence of period p and its shift by k places. The autocorrelation is out-of-phase if p does not divide k.

    Furthermore, to be of practical use for cryptologists we would require:

  • C1: The period should be very long (~ 1050 at a minimum).
  • C2: The sequence should be easy to generate (for fast encryption).
  • C3: The cryptosystem based on the sequence should be cryptographically secure against choosen plaintext attack. (minimum level of security for modern cryptosystems)
  • Feedback Shift Registers

    Feedback Shift Registers are a commonly used method of producing pseudo-random sequences.

    An FSR of length n (n-stage) consists of:

    We first consider the case that f is a linear function, i.e.,

    The output of this LFSR is determined by the initital values s0, s1, ..., sn-1 and the linear recursion relationship:

    or equivalently

    where cn = 1 by definition.

    Ex: Let n = 4, c0 = c2 = c3 = 1, c1 = 0 with initial state (0,1,1,0) then we have

    Time LFSR States Output
    0 0,1,1,0 -
    1 1,1,0,1 0
    2 1,0,1,0 1
    3 0,1,0,0 1
    4 1,0,0,0 0
    5 0,0,0,1 1
    6 0,0,1,1 0
    7 0,1,1,0 0
    Since we have reached the initial state again, this LFSR produces a sequence with period 7. There are 2n possible states, but the all zero state cannot be achieved unless you start with it, so there are 2n - 1 possible states, so this is the maximum possible period.

    A sequence produced by a length n LFSR which has period 2n-1 is called a PN-sequence (or a pseudo-noise sequence).

    We can characterize the LFSR's that produce PN-sequences. We define the characteristic polynomial of an LFSR as the polynomial,

    where cn = 1 by definition and c0 = 1 by assumption.

    Some Facts and Definitions From Algebra

    1. Every polynomial f(x) with coefficients in GF(2) having f(0) = 1 divides xm + 1 for some m. The smallest m for which this is true is called the period of f(x).
    2. An irreducible (can not be factored) polynomial of degree n has a period which divides 2n - 1.
    3. An irreducible polynomial of degree n whose period is 2n - 1 is called a primitive polynomial.

    Theorem: A LFSR produces a PN-sequence if and only if its characteristic polynomial is a primitive polynomial.

    Ex: The characteristic polynomial of our previous example of an LFSR with n = 4 is:
    f(x) = x4 + x3 + x2 + 1 = ( x + 1)(x3 + x + 1) and so is not irreducible and therefore not primitive.

    Ex: f(x) = x4 + x3 + x2 + x + 1 is a monic irreducible polynomial since it has no linear factors and remainder x + 1 when divided by x2 + x + 1. However, x5 + 1 = (x + 1) f(x) and so it has period 5 and is not primitive.

    Ex: f(x) = x4 + x3 + 1 is a monic irreducible polynomial over GF(2). To find its period, we have to determine the smallest m so that f(x) divides xm + 1. Clearly, m > 4, also, by 2) above, the period divides 24 - 1 = 15, thus it must be either 5 or 15. By trying the possibilities we get

    x5 + 1 = (x+1)(x4 + x3 + 1) + (x3 + x)
    x15 + 1 = (x11 + x10 + x9 + x8 + x6 + x4 + x3 + 1)(x4 + x3 + 1)
    Thus, f(x) has period 15 and so, is a primitive polynomial.

    The LFSR with n = 4, c0 = c3 = 1, c1 = c2 = 0 and starting state 0,0,0,1 gives the following:
    Time LFSR States Output
    0 0,0,0,1 -
    1 0,0,1,1 0
    2 0,1,1,1 0
    3 1,1,1,1 0
    4 1,1,1,0 1
    5 1,1,0,1 1
    6 1,0,1,0 1
    7 0,1,0,1 1
    8 1,0,1,1 0
    9 0,1,1,0 1
    10 1,1,0,0 0
    11 1,0,0,1 1
    12 0,0,1,0 1
    13 0,1,0,0 0
    14 1,0,0,0 0
    15 0,0,0,1 1

    Def: Let (f) denote the set of all sequences that can be produced from an LFSR with characteristic polynomial f(x).

    Since each starting state produces a different (we are considering shifts as different) sequence, there are 2n elements in (f) since there are that many starting states. The sum of two sequences in (f) is again in (f) since the sum will satisfy the same recursion relationship (i.e., the sum corresponds to a different starting state).

    We can characterize the elements of (f) in terms of the reciprocal polynomial of f.

    Def: The reciprocal polynomial of f(x) of degree n, denoted f*(x) is:

    f*(x) = xn f(1/x) = c0 xn + c1 xn-1 + ... + cn.
    Note that if f(x) = g(x)h(x) then f*(x) = g*(x) h*(x).

    Theorem: (f) = {t(x)/f*(x) where deg t(x) < n }.

    Pf: We show that each element of (f) can be uniquely expressed in the desired form, and the result will follow since there are exactly 2n binary polynomials of degree < n.

    Let S(x) be in (f), where S(x) = si xi and f*(x) = c0 xn + ... + cn. Then,

    Lemma 1 : Let h(x) and f(x) be the characteristic polynomials of an m-stage and respectively n-stage LFSR. Then (h) is contained in (f) iff h(x) divides f(x).

    Lemma 2: Let S(x) be in (f) with S(x) = t(x)/f*(x). Then there exists an h(x) with h(x) dividing f(x) and h(x) not equal to f(x) with S(x) in (h) iff gcd( t(x), f*(x)) 1.

    We will now consider the pseudo-randomness of PN sequences.

    G1: Since every non-zero state appears once per period and the leftmost bit of the state is the next output value of the sequence, it is easy to see that there are 2n-1 1's and 2n-1 - 1 0's in any period.

    G2: For k n - 2, a run of length k will occur in the sequence whenever the leftmost k +2 states are of the form 0111...110 or 100...001. Since all states occur, the number of each of these state sequences is 2n-k-2. There is one state 011..11, and it is followed by state 11..11 since f is primitive f(1) = 1, and that state is followed by 11..110, thus there is no block of size n-1 and one block of size n. Similarly, there is no gap of size n and only one of size n-1. We can therefore calculate the number of runs as

    and of these 1/2k of them are of length k.

    G3: Let {si} be a PN-sequence and {si+k} be the same sequence shifted k places. The sum of these two sequences satisfies the same recursion relation as the both of them do and so is a PN-sequence as well. The number of agreements in the two sequences will be the number of 0's in the sum and the number of disagreements is the number of 1's in the sum. So by G1,

    for all 1 k < 2n - 1.

    Thus we see that PN-sequences satisfy all of Golomb's conditions for pseudo-randomness. Turning now to the cryptographic conditions:

    C1: One can obtain sufficiently large periods by taking n large enough. In fact, n = 166 will give a period of 2166 - 1 > 1050.

    C2:Being simple Boolean circuits, LFSR's are extremely easy to implement and are very fast.

    C3: Zilch. Given 2n consecutive plaintext bits, sk, sk+1, ..., sk+2n-1 we can write down a system of n equations in the n unkowns c0, ..., cn-1 which is non-degenerate and so has a unique solution. This gives the characteristic polynomial and so the LFSR to the cryptanalyst.

    Thus, linear feedback shift registers should not be used in cryptographic work (despite this, LFSR's are still the most commonly used technique). However, this arguement does not apply to non-linear FSR's so we need to examine them next.

    An FSR with a possibly non-linear feedback function will still produce a periodic sequence (with a possible non-periodic beginning). If the period is p, then the LFSR with characteristic function 1 + xp and starting state equal to the period of the sequence, will produce the same sequence; possibly other LFSR's will also. Hence, the following definition makes sense.

    Def: The linear equivalence of a periodic sequence S(x) is the length n of the smallest LFSR that can generate S(x).

    Theorem: Let S(x) be the generating function of a periodic binary sequence with period p. Let S(p)(x) be the truncated polynomial of degree p-1. Then there exists a unique polynomial m(x) with

    a) S(x)(m), and
    b) if S(x)(h) then m(x) | h(x).

    m(x) is called the minimal characteristic polynomial of S(x), and

    Proof. Let S(x) be in (m), but not in (f) for any proper divisor f of m. We shall prove that m is unique. Note that S(x) = S(p)(x)/ (1 + xp). Since S(x) (m), we know that there exists a t(x) with degree < degree of m, such that S(x) = t(x)/m*(x). By Lemma 2, gcd (m*(x),t(x)) = 1, so

    Cor: The linear equivalence of S(x) with period p is the degree of m(x) above.

    Ex: Consider the following non-linear feedback function (3-stage):
    f(s0,s1,s2) = s0 + s1 + s1s2 + 1 with starting state 1 0 1 we get the following sequence

           1 0 1    -
           0 1 0    1
           1 0 0    0
           0 0 0    1
           0 0 1    0
           0 1 1    0
           1 1 1    0
           1 1 0    1
           1 0 1    1
    
    The period 8 sequence produced has the generating polynomial S(8)(x) = 1 + x2 + x6 + x7.

    Now gcd (S(8)(x), 1 + x8) = 1 + x, so m*(x) = (1 + x)7 = 1 + x + x2 + x3 + x4 + x5 + x6 + x7, and m(x) = m*(x). With starting state 1 0 1 0 0 0 1, this 7-stage LFSR produces the same sequence. The linear equivalence of our starting sequence is thus 7.

    We see that the use of non-linear functions does not gain any cryptographic security since we can always find a LFSR to give the same sequence. In an attempt to get this security, various means of combining the outputs of LFSR's in a non-linear way have been attempted. Clearly, sums, shifts and products of outputs don't work. Most of the information on these techniques is classified (so, someone does believe that the required security can be obtained this way). One such approach which is in the public domain is the multiplexing algorithm of Jennings.

    Take an m-stage (the ai) and an n-stage (the bi) LFSR with primitive characteristic polynomials and non-zero starting states. Choose h min(m, log2 n) entries from the set of subscripts {0, 1, ..., m-1} and order them 0 i1 < i2 < ...< ih < m. At time t, define

    Let t be any one-one mapping from {0,1,...,2h -1} into {0,1, ..., n-1}. Define the output of the multiplexed sequence to be

    Thm: If (m,n) = 1 this multiplexed sequence has period (2m -1)(2n - 1).

    Thm: If (m,n) = 1 and h = m - 1, then the sequence has linear equivalence n(2m - 1).

    Homework:

  • 1. Find all the primitive binary polynomials of degree 4.
  • 2. Prove that all the irreducible binary polynomials of degree 5 are primitive.
  • 3. Find a non-primitive irreducible binary polynomial of degree 6. Construct a LFSR with this polynomial as its characteristic polynomial and determine the statistics of the runs of this LFSR (i.e., how many runs of what lengths are there?)
  • 4. Find the linear equivalence and an LFSR which produces the period 7 sequence that starts 1 0 1 0 0 0 1 .... What starting state will give this sequence?
  • 5. Construct a multiplexed FSR by the Jennings method with (m,n)1 and determine its periods (i.e., the periods of the sequences for each of its starting states).