HTML → PDF (text only)
Strip HTML tags and render the visible text as a paginated PDF.
Overview
The HTML-to-PDF (text only) renderer takes any HTML snippet or full page, strips the tags down to readable text, and emits a paginated PDF. The result is plain prose preserved across page breaks — no CSS, no embedded scripts, no images — perfect for archiving an article in a long-term-readable format.
Knowledge workers archiving articles, students saving research material for offline reading, and compliance teams snapshotting a web page for record keeping reach for this when a faithful visual rendering is not needed. Long-tail searches that lead here include "convert HTML to text-only PDF", "save webpage as plain PDF", and "strip HTML and make printable PDF".
How it works
The renderer parses the HTML using a permissive tree builder that follows the WHATWG HTML5 parsing rules — tags are matched even when malformed, comments are stripped, and inline scripts and styles are discarded. Block elements (<p>, <div>, <h1>–<h6>, <li>, <blockquote>) become paragraphs with appropriate vertical spacing. Inline elements collapse into their text content; whitespace is normalised so multi-space and newline runs do not bloat the output.
A server-side PDF engine then lays the text out into A4 portrait pages with comfortable margins and a single readable serif font. Page breaks happen between paragraphs whenever the next paragraph would overflow, so prose never breaks mid-sentence at the bottom of a page.
Examples
- Save a long-form blog post as a single PDF for offline reading.
- Archive the textual body of a knowledge-base article without its sidebar and navigation.
- Generate a clean printable version of a recipe page that omits ads and image galleries.
- Snapshot the readable content of an email-newsletter web view.
FAQ
Does it preserve images and styling?
No — this is a text-only renderer by design. For full-fidelity rendering (images, CSS, fonts), use a headless-browser-based converter instead.
Are links preserved?
Link targets are dropped; only the link text appears in the PDF. The output is meant to be self-contained reading material rather than an interactive document.
What about lists and tables?
Ordered and unordered lists are preserved with bullets and numbers. Tables are flattened to row-by-row text — they read sensibly but lose grid alignment.
How is malformed HTML handled?
The HTML5-style parser is forgiving: unclosed tags, mis-nested elements, and missing <html>/<body> wrappers are accepted, mirroring how a browser would interpret the markup.
Will it follow external URLs?
No. The input is treated as a literal HTML string, not a URL to fetch. To process a remote page, fetch it elsewhere and paste the body in.