POSIX / PCRE Regex Translator

Translate regex between POSIX character classes and PCRE shorthand.

Open tool

Overview

Translate a regex between POSIX bracket character classes ([[:digit:]], [[:alpha:]]) and PCRE shorthand (\d, \w) - in either direction. Useful when porting a pattern from a Unix CLI tool to a language runtime, or vice versa.

It's for developers working across the regex flavour boundary - moving patterns between grep -E, awk, sed, and language runtimes like Python, JavaScript, Java, .NET, or PHP. Reach for it when a pattern that works in one tool throws "unknown escape sequence" in another.

How it works

POSIX character classes (specified in POSIX.2) are bracketed names: [:alpha:], [:digit:], [:space:], [:alnum:], [:upper:], etc. They appear only inside character class brackets: [[:digit:]] matches one digit. PCRE shorthand (\d, \w, \s) is the Perl-derived form most modern regex engines use, and is shorter to write outside character classes.

The mappings aren't always perfectly equivalent - \w in PCRE includes underscore plus [A-Za-z0-9], while POSIX [:alnum:] is just alphanumeric. The translator notes these subtle differences so you don't introduce subtle bugs.

Examples

  • PCRE to POSIX:
    \d+    ->  [[:digit:]]+
    \w+    ->  [[:alnum:]_]+
    \s+    ->  [[:space:]]+
    
  • POSIX to PCRE:
    [[:upper:][:lower:]]  ->  [A-Za-z]   (or [a-zA-Z])
    [[:punct:]]           ->  [[:punct:]]   (no exact PCRE shorthand)
    
  • Mixed context preserved:
    ^\d{3}-\d{4}$   ->   ^[[:digit:]]{3}-[[:digit:]]{4}$
    
  • Negation:
    \D   ->  [^[:digit:]]
    

FAQ

Why doesn't basic grep recognise \d?

Plain grep uses BRE (basic regex), which doesn't include PCRE shorthand. Use grep -E for extended regex or grep -P for PCRE (where supported).

Is \w always the same?

No - \w semantics differ by engine. In PCRE/Perl/Python \w includes underscore. In .NET, \w is Unicode-aware by default. The translator notes the most common differences.

What about [:print:] and [:cntrl:]?

POSIX has more classes than PCRE has shorthand for. The translator preserves these POSIX classes when no PCRE equivalent exists.

Does the translator handle Unicode property escapes?

PCRE's \p{L} (Unicode letter) and similar property escapes are different from POSIX classes. The translator focuses on ASCII-equivalent mappings; Unicode-aware translations need engine-specific handling.

Try POSIX / PCRE Regex Translator

An unhandled error has occurred. Reload ×