Levenshtein Distance
Compute edit distance and similarity between two strings.
Overview
The Levenshtein distance between two strings is the minimum number of single-character edits — insertions, deletions, or substitutions — needed to turn one into the other. The tool returns the distance, a normalized similarity score (0 to 1), and optionally the actual sequence of edits.
Developers building fuzzy search, spell-checkers, autocomplete, and DNA-style sequence alignment reach for Levenshtein distance constantly. Data engineers also use it to dedupe customer records when an exact match misses near-duplicates ("Robert Smith" vs "Roberrt Smith").
How it works
The classic algorithm fills a (m+1) × (n+1) dynamic-programming matrix where cell (i, j) holds the distance between the first i characters of the source and the first j characters of the target. Each cell is the minimum of three neighbors plus the edit cost: insertion (cell above + 1), deletion (cell to the left + 1), or substitution (cell diagonal + 0 if characters match, else +1). The final answer sits in the bottom-right corner.
Similarity is reported as 1 − distance / max(len(a), len(b)), giving a 0–1 score useful for thresholds in fuzzy matching.
Examples
A: kitten
B: sitting
Distance: 3 (k→s, e→i, insert g)
A: book
B: back
Distance: 2 (o→a, o→c — wait, two subs gives "back"? Actually book→bock→back)
A: flaw
B: lawn
Distance: 2 (delete f, insert n at end)
FAQ
What's the difference between Levenshtein and Hamming distance?
Hamming distance only counts substitutions and requires equal-length strings. Levenshtein allows insertions and deletions too, so it works on strings of different lengths.
Are insertions and deletions weighted equally?
In the standard Levenshtein algorithm, yes — each costs 1. Weighted variants (Damerau-Levenshtein, custom cost matrices) exist when you want, say, swap-adjacent or vowel-substitution to count differently.
Is it the same as "fuzzy match score"?
Fuzzy-matching libraries often normalize Levenshtein into a 0–100 score. The raw distance is the integer underneath.