Lesson 4

Frequency Analysis: The Code-Breaker's Superpower

Why E is a snitch — and how counting letters can crack almost any simple cipher

The secret that beats most ciphers

Imagine someone hands you a 500-letter ciphertext. You don’t know the key. Where do you even start?

You count the letters.

In English, letter frequencies are wildly uneven:

E is the most common letter — about 13% of all letters in English. Then T (9%), A (8%), O (7.5%), I (7%), N (6.7%), S (6.3%), R (6%). At the other end: J, Q, Z each show up less than 0.1% of the time.

Most simple ciphers don’t hide this. The cipher letter that took E’s place will still appear about 13% of the time.

So if you count letters in the ciphertext and find that H appears 13% of the time… it’s probably really an E underneath.

Try it

Paste any English text below. Watch the bar chart show the real letter frequencies — and compare to the typical English pattern (teal bars).

Your text English average

Now erase it all and paste the ciphertext below to see the same pattern shift:

WKH TXLFN EURZQ IRA MXPSV RYHU WKH ODCB GRJ. WKLV VHQWHQFH FRQWDLQV HYHUB OHWWHU RI WKH DOSKDEHW DW OHDVW RQFH.

(This is the same “quick brown fox” sentence, Caesar-shifted by 3.)

What to notice: The shape of the bar chart is identical — just shifted 3 bars to the right. The biggest bar is now at H, because E became H under the shift of 3.

The frequency fingerprint

Every language has a fingerprint. Here are the English top 8, roughly:

LetterETAOINSR
% of text12.79.18.27.57.06.76.36.0

Memorize ETAOINSR — it’s the cheat code of cryptanalysis.

A few more tricks:

Why this beats Caesar instantly

For a Caesar cipher, you don’t even need to be clever. You just need to find the most common letter in the ciphertext and compute:

shift = position_of_cipher_E − position_of_E

If the most common letter is H (position 8), and E is position 5, then the shift is 3. Done. You just broke the cipher in one calculation.

Practice

Which letter is the most common in English?

In a long Caesar-ciphered message, the most common letter is R. What was the shift?

You see the 3-letter word GSI appear 14 times in a 600-word ciphertext. What is your best guess?

Which of these would make frequency analysis HARDER?