Letter Frequency Cryptanalysis

Letter frequency histogram, IC and chi-square vs English.

Overview

A cryptanalysis workbench for classical ciphers. Paste in any ciphertext and the tool produces a letter-frequency histogram, computes the Index of Coincidence (IC), and runs a chi-squared comparison against expected English letter frequencies. Together these tell you a lot about what kind of cipher you're looking at and how to attack it.

Cryptography students, CTF players, ARG solvers, and amateur codebreakers reach for this constantly. It's the first move when staring at a wall of unfamiliar letters: figure out the language, figure out the cipher family, then choose your attack.

How it works

Frequency analysis exploits the fact that English text isn't random: E appears about 12% of the time, T about 9%, then A, O, I, N, S, H. Simple substitution ciphers preserve the shape of this distribution, so the most common ciphertext letter is probably E. The Index of Coincidence measures how "peaky" the distribution is — English text has IC around 0.067, random text around 0.038. Vigenère ciphers have low IC at short keys, then jump back to English when you split the text by the right key length.

The chi-squared statistic compares observed letter counts against expected English counts, with lower scores meaning closer fit. Combined, these three metrics narrow down whether you're facing a Caesar shift, a substitution cipher, a Vigenère, or genuine random output.

Examples

Input:   "VKKRX RX TGRWD ZRYC RZK"
Top freq:  R (5), K (4), X (3), Z (2)
IC:        0.082 (high — likely monoalphabetic substitution)
Chi-sq:    high (distribution shifted from English baseline)

Input (Vigenère output)
IC:        0.041 (low — polyalphabetic, try Kasiski to find key length)

FAQ

What does the Index of Coincidence actually measure?

It's the probability that two randomly-chosen letters from the text are the same. English averages 0.0667; uniformly random text averages 1/26 ≈ 0.0385.

Why does Vigenère have low IC?

Each ciphertext letter is encrypted with a different alphabet shift cycling through the key, smearing the natural English distribution. Splitting by key length restores it.

Can I solve a cipher with just frequency analysis?

For monoalphabetic substitutions on long-enough text, yes. For polyalphabetic ciphers you need more — but frequency analysis on each key offset reduces the problem to a sequence of Caesar shifts.

Try Letter Frequency Cryptanalysis