POSIX / PCRE Regex Translator
Translate regex between POSIX character classes and PCRE shorthand.
Overview
Translate a regex between POSIX bracket character classes ([[:digit:]], [[:alpha:]]) and PCRE shorthand (\d, \w) - in either direction. Useful when porting a pattern from a Unix CLI tool to a language runtime, or vice versa.
It's for developers working across the regex flavour boundary - moving patterns between grep -E, awk, sed, and language runtimes like Python, JavaScript, Java, .NET, or PHP. Reach for it when a pattern that works in one tool throws "unknown escape sequence" in another.
How it works
POSIX character classes (specified in POSIX.2) are bracketed names: [:alpha:], [:digit:], [:space:], [:alnum:], [:upper:], etc. They appear only inside character class brackets: [[:digit:]] matches one digit. PCRE shorthand (\d, \w, \s) is the Perl-derived form most modern regex engines use, and is shorter to write outside character classes.
The mappings aren't always perfectly equivalent - \w in PCRE includes underscore plus [A-Za-z0-9], while POSIX [:alnum:] is just alphanumeric. The translator notes these subtle differences so you don't introduce subtle bugs.
Examples
- PCRE to POSIX:
\d+ -> [[:digit:]]+ \w+ -> [[:alnum:]_]+ \s+ -> [[:space:]]+ - POSIX to PCRE:
[[:upper:][:lower:]] -> [A-Za-z] (or [a-zA-Z]) [[:punct:]] -> [[:punct:]] (no exact PCRE shorthand) - Mixed context preserved:
^\d{3}-\d{4}$ -> ^[[:digit:]]{3}-[[:digit:]]{4}$ - Negation:
\D -> [^[:digit:]]
FAQ
Why doesn't basic grep recognise \d?
Plain grep uses BRE (basic regex), which doesn't include PCRE shorthand. Use grep -E for extended regex or grep -P for PCRE (where supported).
Is \w always the same?
No - \w semantics differ by engine. In PCRE/Perl/Python \w includes underscore. In .NET, \w is Unicode-aware by default. The translator notes the most common differences.
What about [:print:] and [:cntrl:]?
POSIX has more classes than PCRE has shorthand for. The translator preserves these POSIX classes when no PCRE equivalent exists.
Does the translator handle Unicode property escapes?
PCRE's \p{L} (Unicode letter) and similar property escapes are different from POSIX classes. The translator focuses on ASCII-equivalent mappings; Unicode-aware translations need engine-specific handling.