Unicode Normalizer (NFC / NFD / NFKC / NFKD)

Normalize text to one of the four Unicode normalization forms.

Overview

Normalize any Unicode text to one of the four canonical forms defined by the standard: NFC, NFD, NFKC, or NFKD. Normalization makes "the same" character always look the same to a program — for example, ensuring that "é" written as one code point and "é" written as "e" plus a combining acute accent are treated as identical strings.

Developers comparing user input, search-index builders deduplicating queries, file-system tools fixing inconsistent filenames, and security engineers hunting for canonicalization bugs all reach for a Unicode normalizer. Without it, two visually identical strings can fail equality checks for no obvious reason.

How it works

The four forms differ on two axes — composition (combining marks merged with their base character) versus decomposition (split apart), and canonical (preserves meaning) versus compatibility (collapses visually similar forms).

NFC composes everything: "e" + combining acute → "é". NFD decomposes: "é" → "e" + combining acute. NFKC composes and applies compatibility folding (full-width digits become regular digits, ligatures split). NFKD decomposes and applies compatibility folding. NFC is what most web platforms use; NFD is common on macOS file systems. NFKC is useful for search, NFKD for aggressive comparison.

Examples

Input:   "café" (with combining acute)
NFC:     café    (single composed character for é)
NFD:     café    (decomposed: e + combining acute)

Input:   ﬁve         (with the ligature ﬁ, U+FB01)
NFKC:    five        (compatibility folds ﬁ into f + i)

Input:   １２３       (full-width digits, U+FF11–U+FF13)
NFKC:    123         (compatibility folds to ASCII)

FAQ

Which form should I store?

NFC is the safest default for web and most platforms — it produces the shortest representation that still preserves meaning.

What's the difference between canonical and compatibility?

Canonical normalization treats representations that are "the same character" identically (precomposed vs decomposed accents). Compatibility goes further and folds visually-similar characters (full-width vs half-width, ligatures vs split letters) even though they're technically distinct.

Why does macOS use NFD?

The HFS+ filesystem stored filenames in NFD form, which has caused interop friction with NFC-using systems for years. APFS retained the convention for compatibility.

Try Unicode Normalizer (NFC / NFD / NFKC / NFKD)