File Magic Bytes

Detect file type from the first few bytes.

Overview

The File Magic Bytes tool identifies the format of a file by reading the first few bytes — the "magic number" that nearly every binary format embeds in its header. Paste a hex prefix, drop in a file or type a known signature like "PK" and the tool returns the most likely format along with a short description.

This is invaluable when investigating an extension-less attachment, debugging a corrupted upload or writing a server-side validator that doesn't trust the file extension. Long-tail queries it answers include "file signature PK 03 04", "magic bytes for PDF", "how to detect file type from header" and "file format hex signatures list".

How it works

The detection table maps known byte prefixes to file format names. Every major format has a fixed sequence at offset zero (sometimes a few bytes in): "%PDF-" for PDF, "PK\x03\x04" for ZIP and its descendants, "\xFF\xD8\xFF" for JPEG, "\x89PNG\r\n\x1a\n" for PNG.

Detection is a longest-prefix match — when two signatures share the same opening bytes, the more specific one wins. The tool accepts hex (with or without 0x prefix), uploaded file bytes (read entirely client-side) and ASCII strings. It returns the canonical format name plus any common aliases.

Examples

25 50 44 46           →  PDF document  ("%PDF")

50 4B 03 04           →  ZIP archive (also DOCX, XLSX, JAR, APK)

89 50 4E 47 0D 0A 1A 0A  →  PNG image

49 49 2A 00           →  TIFF image, little-endian byte order

FAQ

Why is the file extension not enough?

Extensions can be wrong, missing or deliberately misleading. Magic bytes are inside the file itself and reflect the actual structure, so a .txt file containing a PDF will still be detected as a PDF.

Why do ZIP, DOCX, XLSX and JAR share the same header?

Modern Office formats and Java archives are ZIP files under the hood, with a specific internal directory layout. The header is the same; further inspection of the contained file list distinguishes them.

Can magic bytes be forged?

Yes, easily. A file can start with PDF bytes but contain anything afterwards. The detection is a strong first signal, not a security guarantee. Server-side validation should still parse the file with a real decoder.

What about plain text files?

Text files have no fixed header. The tool falls back to encoding detection — BOM markers identify UTF-8, UTF-16 LE, UTF-16 BE — and leaves the rest as "plain text or ambiguous".

How many bytes do you need?

For most formats, the first 8 to 16 bytes are enough. Some formats (Mach-O fat binaries, certain video containers) need 32 or 64 bytes to disambiguate variants. The tool will read up to the first 64 bytes.

Try File Magic Bytes