Death By Frequency – A Tale of Cryptanalysis
Cryptography is the process of making then intelligible, unintelligible. Be it voice, text or visual data, cryptography can obfuscate any type of message into apparent nonsense which is called encryption and then revert it back to original with decryption. The entire security of a particular cryptography systems lies on the degree of difficulty for a third party to break into its decryption scheme. Crypt-analysis, is the set of methods and techniques that is used in such a cryptographic break. One particular tool in that arsenal is called Frequency Analysis. Based on an extraordinary simple idea, it is an extremely powerful tool in analysing cryptograms and solving them.
In order to understand how crypt-analysis could break into a process which is by design meant to be impregnable, the knowledge on how a typical cipher work is necessary. A cipher is a well defined procedure which is used in encrypting and decrypting messages. The most simplest cipher to come across is the Caesar cipher, which was literally used by Julius Caesar to communicate with Cicero in Rome while campaigning in Europe, hence the name. The encryption scheme simply shifted the original letter by three positions. For instance letter “A” would become “D” and “B” would be “E” and likewise. Even though childishly plain, simple substitution ciphers were prevalent and remained indecipherable for many years until Frequency Analysis came along. Often the discovery of method was attributed to Al-Kindi an Arab polymath who lived in 9th century.
Frequency analysis of a language can simply be explained as the general count or frequency of letters or groups of letters occurring in the writings of that language. The distribution of frequencies obtained from such an analysis is equivalent to a “fingerprint” of that language. For English language the most frequent letters are E, T, A, O while groupings TH, ER, ON and AN are the most common bigrams or groups of two letters.
Letter frequency distribution for English Language (Source: Wikipedia)
With this knowledge a crypt-analyst can analyse the frequency distribution of a given cipher-text and compare the resulting frequencies with the “fingerprint” distribution to guess possible substitutions. For instance, assume that letter Q was the most frequent letter in the enciphered message. Then it is quite probable that letter Q was the substitute for letter E. Also if the occurrences of Q is frequently followed immediately by letter N, it is likely that QN represents TH in the original message. This partial decryption will provide the crypt-analyst with clues to figure out other substitutions too. The entire process is strikingly similar to solving a crossword puzzles. The weakness exploited in a Frequency attack is when a letter is enciphered with only one shift value. For Caesar cipher this value is three. This approach with a single shift value is called a mono-alphabetic cipher.
With the advent of Frequency Analysis, every mono-alphabetic cipher was rendered obsolete. As a replacement, a French diplomat Blaise de Vignere, devised a new cipher in 16th Century which was significantly difficult to break using Frequency Analysis method. The strength of the Vignere cipher lies in the fact that it encrypts same plain-text letter into multiple cipher-text letters. Hence it is called a poly-alphabetic cipher. For the purpose it uses a “Key Phrase”. To encrypt a given text, first the sender would need to write down the Key phrase repeatedly under the text until the lengths match. Imagine for an instance the Key Phrase was “apple” and the message is “hidden and locked”. After the first step this would look,
key : “appleappleapple”
then he would add the alphabet position of the letter in the plain-text row to the corresponding letter in key row. If the addition is greater than 26, the value is wrapped around from the beginning. The resultant value is then looked up in the alphabet which is the enciphered letter. In the example the first letter of plain-text, ‘H’ is the 8th letter of the alphabet. This value is added to the alphabetic position of the first letter in key ‘A’ which is 1. The resultant 9 in the alphabet corresponds to ‘I’. Therefore letter ‘H’ is enciphered to letter ‘I’. This ability of Vignere cipher to encrypt the same letter to multiple enciphered letters, makes it a formidable opponent to classic Frequency Attack. A frequency distribution of a Vigenere cipher text is significantly “flat” from the fingerprint, wherein all frequencies takes more or less the same value. Longer the key smaller the differences of frequencies would get.
Vignere cipher remained for another two centuries a daunting challenge for crypt-analysts until in 1854 legendary British mathematician and engineer Charles Babbage found a method to break into it. Babbage approach towards breaking Vignere consisted of three steps. In the first step he looked for most repetitive sequences of letters in the cipher-text. Then he made a reasonable assumption that despite a small number of these sequences could be purely random, the majority must be enciphered by the same letter sequence in the key, in other words by repetitive instances of the keyphrase. From there he calculated the possible length of the key. It was then a matter of plotting a frequency distribution at intervals specified by the key length before he could employ the same strategy used to break Caesar shifts. Only difference here is that he would need to analyse several frequency distributions corresponding to each letter in the key phrase. Today a simple computer program implementing Babbage’s method can break into almost any Vignere cipher within fraction of a second and for the reason it is not considered safe anymore.
Crypt-analysis is as fascinating as the cryptography itself. In a constant tug-of-war crypt-analysts and cryptographers device more refined methods attacks and ciphers to stay ahead of each other. Frequency analysis in itself is not recognized as a viable solution for most of the modern crypt-analytic functions as ciphers today have become extremely sophisticated. Yet the basic principle behind it, that patterns and repetitions are powerful tools in breaking into ciphers is still the foundation of even the modern crypt-analytic functions.