Canonical URL Helper
Normalise any URL — force https, drop tracking params, sort query.
Overview
The Canonical URL Helper normalises any URL into a single canonical form — forcing HTTPS, lowercasing the host, stripping default ports and known tracking parameters (utm_*, fbclid, gclid), sorting query keys alphabetically, and collapsing redundant slashes. Output is the value you would put inside a <link rel="canonical"> tag.
Reach for it when consolidating duplicate-URL issues flagged by Google Search Console, generating canonical tags for a sitemap, or learning how to normalise URLs for SEO and how to strip tracking parameters consistently. Useful for SEO analysts, link-building teams, and developers wiring canonical logic into a CMS.
How it works
Search engines treat URLs that differ only by case, trailing slash, scheme, or parameter order as separate documents unless told otherwise via a rel="canonical" link. The helper implements the WHATWG URL standard with an SEO-focused normalisation pass on top: protocol upgrade, host lowercase, default-port removal (:80, :443), query-string filter list, query-key sort, and optional trailing-slash policy.
The tracking-parameter filter list covers UTM tags, ad-platform click IDs (gclid, fbclid, msclkid, wbraid), session IDs, and a few Shopify/HubSpot defaults. You can add custom keys to ignore and choose whether to preserve fragments (search engines usually drop them).
Examples
HTTP://Example.com:80/Blog/?utm_source=twitter&id=5→https://example.com/Blog/?id=5.https://example.com/path//double///slash→https://example.com/path/double/slash.https://example.com/a?b=2&a=1&fbclid=xyz→https://example.com/a?a=1&b=2.https://example.com/page/#section→https://example.com/page/(fragment stripped).
FAQ
Should canonicals always be absolute URLs?
Yes. Google explicitly recommends absolute URLs in rel="canonical" to avoid ambiguity, even though relative URLs are technically valid.
Does the helper preserve case in the path?
Path case is preserved because servers may treat it as significant. Only the host is lowercased, which is always safe per RFC 3986.
What about trailing slashes?
You can pick a policy — always add, always strip, or leave untouched. Pick one and stick to it sitewide; mixing causes its own duplicate-content problems.
Will this also remove session cookies in the path?
No. URL-embedded session IDs (;jsessionid=...) are path parameters and are removed if you enable the matrix-parameter stripping option. Cookies in headers are untouched.