PDF Pattern Redactor
Redact text matching regex patterns (SSNs, emails, phone) across every PDF page.
Overview
The PDF pattern redactor scans every page of a PDF for text matching one or more regular expressions — SSNs, credit card numbers, email addresses, phone numbers, custom identifiers — and replaces every match with a solid black box. The result is a sanitised PDF safe to share externally.
Compliance teams preparing case files for discovery, HR staff sharing employment records, and journalists publishing source documents reach for this when manual redaction is too tedious to do per-occurrence. Long-tail searches that lead here include "redact regex pattern in PDF", "automatic PII redaction PDF online", and "black out SSNs across PDF pages".
How it works
The redactor extracts the text from each PDF page along with the bounding rectangle of every glyph run (using the page's content stream and font metrics). It then runs the supplied regular expression patterns against the extracted text. For each match, the corresponding rectangle is computed and a filled black rectangle is drawn over those coordinates in the page content stream.
The replacement is destructive in the spatial sense — the visible text under the rectangle is hidden from view — and the underlying text run is also removed or replaced with whitespace so that copy-paste and text search no longer reveal the redacted content. The PDF specification supports this via the standard rectangle-drawing operators and text-content rewriting.
Examples
- Redact every nine-digit Social Security Number with the pattern
\d{3}-\d{2}-\d{4}. - Black out email addresses with a permissive RFC 5322-inspired regex.
- Hide internal project codes that follow a
PROJ-\d{5}shape. - Strip phone numbers in many formats with a single combined pattern.
FAQ
Is the redaction reversible?
No. The underlying text is rewritten in addition to the visual cover-up, so neither rendering nor selecting reveals the original. Keep an unredacted backup before processing.
What about scanned PDFs without selectable text?
The redactor relies on the text layer. For image-only scans, run OCR first; otherwise the regexes have no text to match.
Can I use multiple patterns at once?
Yes. Provide them one per line; matches from any pattern are redacted in the same pass.
Does it match across line breaks?
Pattern matching runs on each page's logical text flow. Text wrapped across visual lines is typically matchable; text split across two pages is not, since each page is processed independently.
Why is a portion of the match unredacted?
PDFs occasionally lay text out one glyph at a time with non-contiguous positioning. When the extractor cannot reliably reconstruct the run, the redactor errs on the side of leaving it visible rather than redacting unrelated content. Inspect the output and consider a stricter export.