POSIX / PCRE Regex Translator

Translate regex between POSIX character classes and PCRE shorthand.

Overview

Translate a regex between POSIX bracket character classes ([[:digit:]], [[:alpha:]]) and PCRE shorthand (\d, \w) - in either direction. Useful when porting a pattern from a Unix CLI tool to a language runtime, or vice versa.

It's for developers working across the regex flavour boundary - moving patterns between grep -E, awk, sed, and language runtimes like Python, JavaScript, Java, .NET, or PHP. Reach for it when a pattern that works in one tool throws "unknown escape sequence" in another.

How it works

POSIX character classes (specified in POSIX.2) are bracketed names: [:alpha:], [:digit:], [:space:], [:alnum:], [:upper:], etc. They appear only inside character class brackets: [[:digit:]] matches one digit. PCRE shorthand (\d, \w, \s) is the Perl-derived form most modern regex engines use, and is shorter to write outside character classes.

The mappings aren't always perfectly equivalent - \w in PCRE includes underscore plus [A-Za-z0-9], while POSIX [:alnum:] is just alphanumeric. The translator notes these subtle differences so you don't introduce subtle bugs.

Examples

PCRE to POSIX:

\d+    ->  [[:digit:]]+
\w+    ->  [[:alnum:]_]+
\s+    ->  [[:space:]]+

POSIX to PCRE:

[[:upper:][:lower:]]  ->  [A-Za-z]   (or [a-zA-Z])
[[:punct:]]           ->  [[:punct:]]   (no exact PCRE shorthand)

Mixed context preserved:

^\d{3}-\d{4}$   ->   ^[[:digit:]]{3}-[[:digit:]]{4}$

Negation:
```
\D   ->  [^[:digit:]]
```

FAQ

Why doesn't basic grep recognise \d?

Plain grep uses BRE (basic regex), which doesn't include PCRE shorthand. Use grep -E for extended regex or grep -P for PCRE (where supported).

Is \w always the same?

No - \w semantics differ by engine. In PCRE/Perl/Python \w includes underscore. In .NET, \w is Unicode-aware by default. The translator notes the most common differences.

What about [:print:] and [:cntrl:]?

POSIX has more classes than PCRE has shorthand for. The translator preserves these POSIX classes when no PCRE equivalent exists.

Does the translator handle Unicode property escapes?

PCRE's \p{L} (Unicode letter) and similar property escapes are different from POSIX classes. The translator focuses on ASCII-equivalent mappings; Unicode-aware translations need engine-specific handling.

Try POSIX / PCRE Regex Translator