# An Example of Breaking a Monoalphabetic Substitution Cipher

Here we have our unknown cryptogram:
```CKPKH GVGCK UGZQA GCKUG CLGPQ FJZIG PQQAF QQLHG

FJZEF QGKEF CCQAG LOULJ QFRGM OGPQA FUGZO SJBQA

GLOTS MFOKS JZKOQ VKIGE KOGFJ ZKJGI XKJGT OGMQP

LCGJQ CXQKO GPQYD
```

### We organize this analysis into 7 stages:

• Stage 1
• Stage 2
• Stage 3
• Stage 4
• Stage 5
• Stage 6
• Stage 7
• Solution

• We start our cryptanalysis by trying to identify some very high frequency letters: (In the analysis, capital letters refer to the cryptogram, lower case letters to English)
• G is the most frequently occurring letter, so we assume that it is an "e".
• "the" is the most frequent trigram in English, so we look for frequent trigrams that end in G, i.e., QAG, KOG, KUG and KJG.
• "t" should be a high frequency letter and "h" a medium frequency letter. QAG has a slight advantage over KUG in this regard. The "th" digram is the most frequent, supporting QAG as the correct choice.

Having identified "t", "h" and "e" as Q, A and G respectively, we look at our cryptogram again.
```      e e    e th e   e   e t     e  tth  tt  e
CKPKH GVGCK UGZQA GCKUG CLGPQ FJZIG PQQAF QQLHG
te      the       t  e   e th   e      th
FJZEF QGKEF CCQAG LOULJ QFRGM OGPQA FUGZO SJBQA
e               t    e    e      e     e   e t
GLOTS MFOKS JZKOQ VKIGE KOGFJ ZKJGI XKJGT OGMQP
e t   t   e t
LCGJQ CXQKO GPQYD
```
• F (occuring in block 7) must be a vowel and a high frequency one like "a" or "o" (as opposed to u or i). "tha" is a high frequency trigram, and QAF is on our list, so we associate F with a.
• "an" is a high frequency digram and "and" a high frequency trigram. FJ and FJZ would seem to fit, so associate J with n and Z with d.
letter frequency count, and digram and trigram count.

Back to Start

```      e e    edth e   e   e t and e  ttha tt  e
CKPKH GVGCK UGZQA GCKUG CLGPQ FJZIG PQQAF QQLHG
and a te  a   the     n ta e   e th a ed   n th
FJZEF QGKEF CCQAG LOULJ QFRGM OGPQA FUGZO SJBQA
e      a    nd  t    e    ean d ne    ne   e t
GLOTS MFOKS JZKOQ VKIGE KOGFJ ZKJGI XKJGT OGMQP
ent   t   e t
LCGJQ CXQKO GPQYD

abcdefghijklmnopqrstuvwxyz
F..ZG..A.....J.....Q......
```
• e,t,a,o,n are the most frequent letters and we have found 4. The high frequency K must surely be our missing letter. Associate K with o.
• "he" and "re" are high frequency digrams. OG is on our list so associating O with r is pausible considering that "r" is a medium frequency letter, and so is O.
letter frequency count, and digram and trigram count.

Back to Start

``` o o  e e o  edth e o e   e t and e  ttha tt  e
CKPKH GVGCK UGZQA GCKUG CLGPQ FJZIG PQQAF QQLHG
and a teo a   the  r  n ta e  re th a edr  n th
FJZEF QGKEF CCQAG LOULJ QFRGM OGPQA FUGZO SJBQA
e r    aro  ndort  o e  orean done   one  re t
GLOTS MFOKS JZKOQ VKIGE KOGFJ ZKJGI XKJGT OGMQP
ent   tor e t
LCGJQ CXQKO GPQYD

abcdefghijklmnopqrstuvwxyz
F..ZG..A.....JK..O.Q......
```
• Now looking at ..oreandone..one.. (above blocks 21 and 22), we can see ".. and one .. one..", a common phrase which would indicate that IX should be associated with by.
• Examining the keyword substitution list, we clearly see the end of the alphabet in place. This would force us to associate s with P and z with Y. The spacing would require that q is associated with either M or N, but the low frequency of "q" favors the association of q with N.
letter frequency count, and digram and trigram count.

Back to Start

``` oso  e e o  edth e o e   est andbe sttha tt  e
CKPKH GVGCK UGZQA GCKUG CLGPQ FJZIG PQQAF QQLHG
and a teo a   the  r  n ta e  resth a edr  n th
FJZEF QGKEF CCQAG LOULJ QFRGM OGPQA FUGZO SJBQA
e r    aro  ndort  obe  orean doneb yone  re ts
GLOTS MFOKS JZKOQ VKIGE KOGFJ ZKJGI XKJGT OGMQP
ent  ytor estz
LCGJQ CXQKO GPQYD

abcdefghijklmnopqrstuvwxyz
FI.ZG..A.....JK.NOPQ....XY
```
• The remaining high frequency letters are C and L, while the last of the high frequency letters in English is an "i". The occurance of ..CC.. in block 11 makes the choice of C for i a poor one, so we associate L with i.
• The order of frequency for doubles in English is "ss", "ee", "tt", "ff", "ll", "mm", and "oo". Thus C is most likely f, l or m. f and m would give poor fits in block 11 so we associate C with l.
letter frequency count, and digram and trigram count.

Back to Start

```loso  e elo  edth elo e liest andbe sttha tti e
CKPKH GVGCK UGZQA GCKUG CLGPQ FJZIG PQQAF QQLHG
and a teo a llthe ir in ta e  resth a edr  n th
FJZEF QGKEF CCQAG LOULJ QFRGM OGPQA FUGZO SJBQA
eir    aro  ndort  obe  orean doneb yone  re ts
GLOTS MFOKS JZKOQ VKIGE KOGFJ ZKJGI XKJGT OGMQP
ilent lytor estz
LCGJQ CXQKO GPQYD

abcdefghijklmnopqrstuvwxyz
FI.ZG..AL..C.JK.NOPQ....XY
```
• From the keyword list, we can see that p should be associated with M.
• The rest of the letters can be determined by trial and error.
• Blocks 4 and 5 indicate U associated with v to get the word "lovliest".
• Block 8 indicates H associated with m to get "time".
• Blocks 10 and 11 indicate E associated with f to get "of all".
• Blocks 12 and 13 indicate R associated with g to get "vintage".
• Block 24 indicates T associated with c to get "crept".
letter frequency count, and digram and trigram count.

Back to Start

```losom e elo vedth elove liest andbe sttha ttime
CKPKH GVGCK UGZQA GCKUG CLGPQ FJZIG PQQAF QQLHG
andfa teofa llthe irvin tagep resth avedr  n th
FJZEF QGKEF CCQAG LOULJ QFRGM OGPQA FUGZO SJBQA
eirc  paro  ndort  obef orean doneb yonec repts
GLOTS MFOKS JZKOQ VKIGE KOGFJ ZKJGI XKJGT OGMQP
ilent lytor estz
LCGJQ CXQKO GPQYD

abcdefghijklmnopqrstuvwxyz
FITZGERAL..CHJKMNOPQ.U..XY
```
• A final checking of the substitution list gives the remaining associations:
• j with D
• k with B
• u with S
• w with V
• x with W
The cryptogram is finally solved (with keyword FITZGERALD).
Lo! some we loved, the lovliest and best
That Time and Fate of all their Vintage prest,
Have drunk their Cup a Round or two before
And one by one crept silently to rest.

Back to Start

Back to Lecture Notes

Back to Supplementary Material

### Frequency Lists

```                   Cryptogram                English (based on 135 letters)

G .......... 21              e .................... 17
Q .......... 16              t .................... 13
K .......... 12              a, o ................. 11
F,J,O ......  9              n, i ................. 10
C ..........  8              s ....................  9
L,P,Z ......  6              r ....................  8
A ..........  5              h ....................  7
U ..........  4              l, d .................  5
E,I,M,S ....  3              c, u .................  4
H,T,V,X ....  2              p,f,m,w ..............  3
B,D,R,Y ....  1              y,b,g ................  2
v,k ..................  1
```

### Digram and Trigram Counts

```             Digrams in Cryptogram                  English Digrams

QA ...................... 5         th .................... 4
GP,JZ,OG,PQ ............. 4         he .................... 3
KO,FJ,CK,AG,UG .......... 3         an,in,er,re,es ........ 2
GC,GZ,GF,GL,GM,QF                    on,ea,ti,at,st
QQ,KU,KJ,FQ,JQ,JG                    en,nd,or,to,nt
LO,ZK,AF,EF,IG,SJ ....... 2          ed,is,ar,ou,te
of,it,ha,se,et ....... 1

Trigrams in Cryptogram                English Trigrams
(in order of frequency)

GPQ ........................ 4             the
QAG, FJZ ................... 3             and
QAF,JZK,OGP,KOG                            tha
CKU,AGL,UGZ,GFJ                            ent
GLO,KUG,KJG ................ 2             ion
```