RTF → Plain Text

Strip RTF control words and groups to recover plain text.

Overview

The RTF-to-text converter strips Rich Text Format control words, font tables, and groups, leaving the plain readable text behind. Paste an RTF document (the kind WordPad and many email clients produce) and get back the prose without the formatting metadata.

It's the right tool when you've copied content from a document into a system that can't parse RTF, or when extracting text from a database column that stores legacy RTF blobs. Developers, content migration teams, and anyone cleaning up legacy data use an rtf to text converter to recover the underlying text without firing up Word.

How it works

RTF is a tagged text format introduced by Microsoft in 1987. The body is wrapped in {} groups and peppered with \control words like \b (bold), \par (paragraph), \fonttbl (font table). The converter walks the source, tracks group nesting, drops the metadata groups (font table, colour table, stylesheet, info), and outputs the literal text from content groups.

Escape sequences like \'XX (hex-encoded characters) are decoded using the document's declared codepage; \uNNNN is decoded as Unicode. Line breaks come from \par and \line control words. Special characters (\\, \{, \}) are unescaped to their literal forms.

Examples

Input:
{\rtf1\ansi\deff0
{\fonttbl{\f0 Times New Roman;}}
\b Hello\b0  \i world\i0 .\par
}

Output:
Hello world.

Input:
{\rtf1 Line 1\par Line 2\par}

Output:
Line 1
Line 2

FAQ

Does it preserve formatting like bold or italic?

No — by design. The output is plain text only. If you need styled output, use the ANSI-to-HTML or markdown converters for those source formats, or open the RTF in Word.

Are embedded images extracted?

Image data inside RTF is base64-encoded inside {\pict ...} groups. The converter drops the picture data because the text-extraction goal doesn't need it.

What if the RTF uses non-Latin characters?

\uNNNN Unicode escapes are decoded correctly. \'XX legacy hex escapes are decoded using the codepage declared in the header (typically \ansicpg1252). If the codepage is missing, the converter falls back to Windows-1252.

Try RTF → Plain Text