Dorabella & D'Agapeyeff Ciphers - Solutions
D'Agapeyeff Cipher - Solution
Background
This cipher was included as a challenge cipher at the end of the book entitled 'Codes and Ciphers - A History of Cryptography', an elementary book on cryptography by the Russian-born English cryptographer and cartographer Alexander d'Agapeyeff. First published in 1939, the cipher was again included in the 1949 reprint but omitted from later editions of the book (1952 onwards), after d'Agapeyeff reportedly confessed to having forgotten how he had originally created the cipher. It has since reappeared in a recent Read Books Ltd. (paperback edition), available on Amazon.
Acknowledgements
I am grateful to members of the Yahoo D'Agapeyeff Group (https://groups.yahoo.com/neo/groups/Dagapeyeff_Cipher/info) for supplying me with crucial bits of information present only in the original version of the book. Without these, a solution would (IMHO) be nigh impossible.
The Cipher
For those not familiar with the cipher, it can be seen (as it appears in the book) on the Wikipedia page. [Opens in new window]
General
Much has been written about the cipher over the years, there being (I think) general agreement on at least the following interesting features:
-
the final 3 digits (000) are fillers to enable the message to be split into an exact number of 5-digit groups
as was standard practice in the days when telegraphic transmission was the norm
-
the remaining 392 digits can be readily split into 2-digit groups, chosen from [67890] [12345]
Eg - the message begins 75628 28591, which can be rewritten as 75 62 82 85 91 and so on
-
a length of 196 groups suggests a 14 x 14 square has been used at some stage of the encipherment process (Transposition)
-
a Polybius Square (5 x 5) may well have been used at some stage of the encipherment process (Substitution)
-
only 18 of the 25 possible combinations of [67890] [12345] are present in the message
groups 61, 73, 95, 01, 02, 03 and 05 are never seen
-
only 13 groups occur regularly (9 or more occurrences) - the other 5 groups are low-scoring (1, 2 or 3 occurrences)
the low-scoring groups (with number of occurrences) are 71 (1), 92 (3), 93 (2), 94 (1) and 04 (1)
-
if the 2-digit groups are written horizontally into a 14 x 14 square, the low-scoring groups all occur in column 14
it follows that, if the message is inscribed vertically into the same square, the low-scoring groups will all occur in row 14.
-
nulls may be present in the cipher as a way of thwarting any statistics-based attack (described by some as Concealment)
Analysis and System Diagnosis
My initial (gut) reaction on first reading about the cipher was that an understanding of the low-scoring groups would prove vital to achieving any sort of breakthrough. I decided to focus on these groups - partly because my poor knowledge of statistical analysis would prevent me from doing anything meaningful that hadn't already been done by those better qualified. For what it's worth, the Index of Coincidence is a favourite tool of the cryptanalyst and I was able to verify, using online facilities, that the I.C for the cipher is ~1.89 (if we ignore column 14 and adjust the alphabet length to 25). However, by ignoring column 14, we are really reducing the alphabet length to just 13 - which makes the I.C. ~0.98? Hmmm! This suggested to me some sort of keyed or polyalphabetic system. Could the low-scoring groups be a result of some sort of key breakout? Or plaintext breakout? Either would mean 'final cipher write-in vertically, read-out horizontally' was the more likely set-up, with the break-out in the last row. At this point, I decided I had to purchase a copy of the book. [My funds only went as far as a new edition, bought online from Amazon (published by Read Books Ltd. in 2013). Amazingly, the cipher message appears twice!].
Initially, I couldn't see anything I didn't already know about the cipher but, on reading through the book, several things struck me as odd - either by their very nature or because of their position in the book:
-
D'Agapeyeff gives greater prominence to the Polybius Square than other books on Cryptanalysis which I have
-
D'Agapeyeff gives greater prominence to Porta than other books on Cryptanalysis which I have
-
D'Agapeyeff introduces (in the sub-section on solving a combination cipher) two variant systems - which seems illogical
-
D'Agapeyeff then mentions substituting numbers (10 upwards to 99) for letters but doesn't mention 0 to 9 (or 1 to 0)
-
D'Agapeyeff then mentions using a date as an alternative way of forming a Transposition Keyword - not necessarily odd, maybe a clue
At this point, I was stuck. I tried (over days/weeks/months) several things - eg - mononome/dinome substitution with various parameters. I produced trial setups and generated test messages - but I couldn't conceive of any sensible sort of system which yielded an alphabet of length 13 (the number of occurring groups if one ignores the low-scoring groups above). I knew that a Porta system normally comprised two half-alphabets, each of length 13 - but I was proving to be stubbornly dim until (by chance) I revisited the sub-section on Combined Substitution - Transposition Cipher. The simplified Porta example, apart from being strangely out of place, seemed unremarkable - until I realised that each alphabet setup comprised half-alphabets of length 10 (NOT the normal 13). This got me wondering. What if a half-alphabet comprised the letters QWERTYUIOP (the top row of the standard typewriter keyboard - sorry, showing my age!), with each equating to the numbers 1, 2, ...... 9, 0? At first glance, that doesn't help, since the Input (pre-enciphering) would be restricted to just 10 values, ruling out most systems. Then the penny dropped! We have already seen that the cipher groups comprise the digits [67890] [12345]. If we split each group into its 2 component digits and then change digits into letters using QWERTYUIOP, our Input will consist entirely of those 10 values. [Some readers will no doubt recognise this process as fractionation.] And hence, our Output will be restricted to just 10 values. An example setup is shown below, [And by introducing additional Output half-alphabet setups, the total number of unique Output cipher values can easily be fixed at 13. From there, the Output letters can be re-substituted to yield 2-digit groups, as per the final cipher.]
Example Porta set-up
Q W E R T Y U I O P Input
A B C D F G H J K L Output
Total Output set is: ABCDFGHJKLMNZ - 13 letters, avoiding QWERTYUIOP (and SVX) - see later for Polybius recovery
Armed with this possibility, I looked again at the examples contained in the book, feeling sure that a Polybius square was involved at some stage of the enciphering process. D'Agapeyeff invariably uses a Keyword in constructing a mixed alphabet to fill his Square (5 x 5, with the letter J normally omitted). Eg - with Keyword BRISTOL, the 25-letter alphabet would be BRISTOLACDEFGHKMNPQUVWXYZ - and this alphabet would then be used to fill the 25 cells of the square in some predetermined order (by row, by column etc.). If BIRMINGHAM were the Keyword, this would be reduced to BIRMNGHA (since repeated letters in the Keyword must first be removed), followed by the remaining letters of the alphabet, giving BIRMNGHACDEFKLOPQSTUVWXYZ.
I examined again the 12 groups which either never appear (in yellow), or are low-scoring (in red). In order, they are:
61 71 73 92 93 94 95 01 02 03 04 05
and compared them with the 13 cipher groups which occur regularly (and comprise 188 of the total 196 groups):
62 63 64 65 72 74 75 81 82 83 84 85 91
From the examples in the book (and bearing in mind the Porta split alphabets which might be present), I reasoned that I should be looking for a Keyword (duplicate letters removed) which included three letters from the set [QWERTYUIOP]; and that these three letters would equate to groups 61, 71 and 73 (in some order). Also, group 91 might equate to letter I/J, being the last regularly-occurring cipher group and immediately before the first plain group (other than the 3 groups already mentioned which would seem to be part of the Keyword). Finally, the row coordinates (at least) might be in order (67890).
I asked myself (several times more than I care to mention!) "which Keyword could D'Agapeyeff possibly have chosen?" Doh! There it was, staring me in the face - D'AGAPEYEFF (or rather, DAGPEYF after duplicate letters are removed). A trial construction was made - which turned out to be almost correct. A couple of tweaks (well - three actually), assisted by later analysis, sorted things to my satisfaction.
Click to see Construction of Mixed Alphabet and Polybius Square Fill
Breakthrough
My focus now was back on the low-scoring groups. Writing the message vertically into a 14 x 14 matrix, here are the 14 groups which comprise the bottom row, with the 8 low-scoring groups highlighted in red:
84 62 93 92 85 64 04 94 71 93 74 92 83 92
The pattern of 5 blocks of 2 groups then 4 single groups intrigued me. Knowing that the book was published sometime in 1939 (but not exactly when), I wondered if this might be a date. [I was now reasonably happy that fractionation had been used.] Convinced that a single letter (L) is transformed into 2 groups and a number (N) into 1 group, the sequence above may well have started out as L L L L L N N N N (possibly March 1939? or April 1939? - these being the only 5-letter months). A further thought occurred! In the construction of the Polybius Square, an alphabetic sort was employed, so "let's do the same with March and April", I said to myself. 'March' becomes ACHMR. Nothing remarkable there - but 'April' when sorted becomes AILPR. Notice that the I, P and R (from [QWERTYUIOP]) are now perfectly aligned with the red letters of the L L L L L sequence above. Now I was sure I was onto something! But I needed confirmation of the month. The Yahoo group came up trumps! Replying to my email query, two members of the group (in possession of an original 1939 edition copy) were able to confirm that the month of publishing wasn't known BUT that the book opened with a short note of thanks, signed:
quote
Alexander d'Agapeyeff
LONDON
April 1939
unquote
I also had confirmation that the note of thanks is the same as the one contained in my 2013 version. Subsequent email exchanges between group members revealed that "LONDON April 1939" was (perhaps unsurprisingly) missing from the 1949 edition and that the cipher itself was removed from the 1952 and subsequent editions. [D'Agapeyeff's son was apparently later quoted as saying that his father had become embarassed at the amount of time people had spent trying to solve the cipher, as he didn't have a copy of the solution written down.]
With "D'Agapeyeff" before and "April 1939" after, I now began to wonder if "LONDON" featured in some way - but with only 6 letters (or 12 digits) to play with, rather than 7 (or 14), I concluded the answer must be NO. How wrong can one be?!
Re-reading the section on 'Combination Substitution - Transposition, with the 2 following sections being the oddly-placed
Chinese Fill Transposition and Simplified Porta, I was now having second thoughts about the Row 14 versus Column 14 layout - and changed my mind to one of Column 14. It doesn't affect the groups and the analysis thus far - but it changed my thinking on what might be underlying the 14 groups. Instead of being a Key (or Plain) breakout, it would now have to be a breakin !! Perhaps 'April 1939' was a Transposition Key? [This would be unusual in that the Key would be part of the cipher - but hey!] I tried to come up with a solution to the column - but couldn't really make sense of the 14 groups - except that if it was involving a Porta, the cipher letters (from Polybius Square recoveries) should span the half-alphabet CDFGHKLMNZ. [MNZJABCDFG is also a candidate - but involves a 'out-the-back-door, in-the-front-door loop around]
I looked again at LONDON as being fingered - but lacking evidence. Perhaps a Railfence Transposition? If letter J doubles as letter I, we have: