Link Extractor

Extract every link from pasted HTML or text.

Overview

The Link Extractor scans pasted HTML or plain text and pulls out every URL it can identify — href attributes, src attributes, plain-text URLs, and Markdown link syntax — returning a deduplicated, sortable list with the anchor text or context for each. Categorise as internal vs external, classify by status (200/3xx pattern in URL), or filter by domain.

Useful for SEO auditors and developers learning how to extract every link from an HTML page or how to pull URLs from plain text. Reach for it building site audits, harvesting URLs from documentation for redirect mapping, or counting external link footprints.

How it works

The extractor walks the input with a permissive parser: it accepts full HTML documents, fragments, plain text with embedded URLs, and Markdown. For HTML it inspects <a>, <link>, <img>, <script>, <iframe>, <source>, and <video>. For plain text it uses a URL-shaped regex that recognises common schemes (http, https, ftp, mailto, tel).

Output includes the URL, the element type or source context, and (for <a> elements) the visible anchor text. URLs are deduplicated case-insensitively on host; path case is preserved because the path may be case-significant on the destination server.

Examples

HTML page yields 84 links with 12 unique external domains.
A Markdown README is parsed to surface [anchor](url) and reference-style [label]: url links.
A plain-text changelog reveals 30 release URLs embedded in prose.
Filtering by external strips internal links and reveals which third-party sites the page links to most.

FAQ

Does the extractor follow links?

No, it only inspects what's pasted. Pair it with a redirect tester to fetch each URL.

Are mailto and tel URLs included?

Yes by default — they are URLs per RFC 3986. Filter them out if you only want web links.

How does dedup handle query strings?

Each unique full URL is preserved. https://example.com/?a=1 and https://example.com/?a=2 are reported separately because they may resolve differently.

What about JavaScript-generated links?

The extractor only sees the static markup you paste. Links rendered by client-side JavaScript are not visible until you serialise the rendered DOM.

Try Link Extractor