What rel=canonical actually does (and what it doesn't)
A canonical tag is a single line of HTML in your page head that tells search engines, "if you find duplicate or near-duplicate versions of this content, treat this URL as the original." That's it. It looks like <link rel="canonical" href="https://example.com/page" />and it sits quietly in your head tag doing more work than almost any other on-page element. The catch most people miss is that it's a hint, not a directive. Google can (and routinely does) ignore your canonical if the signals don't add up.
Why does duplicate content even matter? Because when Google finds the same content at five different URLs (with tracking parameters, with and without trailing slashes, http and https, www and non-www, mobile subdomain versions), it has to pick one to rank. If you don't tell it which version to consolidate signals around, it'll guess. Sometimes it guesses right. Often it picks the URL with the most internal links, the shortest path, or the one that happens to load first during crawl. None of those are necessarily the URL you want indexed.
A canonical tag checker pulls the rel=canonical from the page source (and from HTTP headers, since you can also send canonicals via Link header), shows you exactly what URL is declared, and flags whether it points to itself, to another page, to a redirect, to a 404, or to a noindex page. The last three are quiet disasters that erode rankings without throwing errors.
Self-referencing canonicals: boring but essential
Every indexable page should have a self-referencing canonical that points to its own clean URL. This sounds obvious, but plenty of sites skip it on the assumption that "if there are no duplicates, why bother?" The answer is that duplicates appear whether you create them or not. Someone shares your URL with a UTM parameter. An affiliate link adds ?ref=partner. Facebook tacks on ?fbclid=. Suddenly you have ten URLs Google might index, all serving identical content.
With a self-referencing canonical in place, all those parameter variants tell Google, "ignore the noise, the real URL is this one." Without it, you're relying on Google to figure it out, and parameter handling in Search Console is largely deprecated now. The self-canonical is doing the work that used to require manual rules in GSC.
Cross-domain canonicals and content syndication
Cross-domain canonicals are where things get interesting. Say you publish an article on your blog and syndicate it to Medium, LinkedIn, and an industry publication. By default, those republished versions compete with your original for rankings, and bigger domains often outrank you on your own content. The fix is asking the syndication partner to add rel="canonical" pointing back to your URL on their copy.
Whether they'll honor the request depends on the platform. Medium supports it through their import tool. Most CMS platforms expose a canonical field you can ask the editor to fill. LinkedIn articles don't. When cross-domain canonicals are set correctly, link equity and ranking signals consolidate to your domain even when the content lives elsewhere. When they aren't, you're effectively giving away authority every time you syndicate.
One nuance: cross-domain canonicals only work when the content is actually substantially similar. If you rewrite the article significantly for the syndication partner, the canonical relationship breaks down logically and Google may ignore it.
The mistakes that quietly tank pages
The most common canonical mistake is pointing a canonical to a URL that's blocked by noindex, robots.txt, or a 404. You're telling Google, "the real version of this page is over there," while simultaneously telling it not to index "over there." Result: neither version ranks, and your traffic to that piece evaporates. A canonical checker catches this in seconds; without one, you might not notice until rankings drop.
Conflicting canonicals are another regular offender. WordPress plugins, Shopify apps, and JavaScript frameworks sometimes inject their own canonical alongside the one you set in your theme or template. Now the page has two canonical tags pointing different places, and Google picks whichever it wants, usually the first one in the source. The http vs https mismatch is similar; your canonical declares https but you're still serving the http version, or vice versa, and Google sees a redirect chain it has to resolve.
Then there's parameter handling. URLs like /products?color=red&size=large should typically canonicalize to /products unless the parameterized version is a unique landing page worth indexing on its own (filtered category pages with significant search volume, for example). Get this wrong and you either bloat the index with thin parameter pages or hide pages that should rank.
Canonical, noindex, and robots.txt: pick the right tool
These three get conflated constantly, and they do entirely different things. Canonical says, "consolidate ranking signals to this URL." Noindex says, "don't show this page in search results." Robots.txt says, "don't crawl this URL at all." Use canonical for duplicates you want indexed under one URL. Use noindex for pages you want crawled but never shown (thank-you pages, internal search results, low-value tag archives). Use robots.txt to prevent crawling entirely (admin areas, faceted navigation that explodes into millions of URLs).
Mixing them creates problems. If you robots.txt block a page, Google can't see your noindex tag because it never crawled the page to read it. If you canonical to a noindexed page, you nuke both URLs from the index. Pick one signal per outcome and let it do its job.
How Google chooses when you don't (or choose wrong)
When canonicals are missing or contradicted, Google runs its own canonicalization process. It looks at internal linking patterns (which version do you link to most?), sitemap inclusion, redirects, hreflang clusters, HTTPS vs HTTP, URL length and cleanliness, and inbound links. The URL with the strongest combined signals wins, and the others get filtered as duplicates.
Search Console's URL Inspection tool will tell you the user-declared canonical (what you said) and the Google-selected canonical (what Google actually chose). When those differ, you have a problem worth investigating. Often it means your signals are inconsistent: your sitemap lists one URL, your internal links point to another, and your canonical declares a third. Google picks the one with the cleanest signal, which may not be the one you wanted.
Pagination: the rel=prev/next afterlife
Google deprecated rel=prev/next as a ranking signal back in 2019, but the question of how to handle paginated archives still confuses people. The current advice is straightforward: each paginated page should self-canonicalize (page 2 canonicals to page 2, not to page 1). Don't canonical all paginated pages to the first page, because that hides the deeper content from indexing entirely.
If you genuinely want only page 1 indexed, use noindex on subsequent pages while keeping the links followable. If you want all pages indexed but consolidated under a category landing page, use a "view all" URL and canonical the paginated versions to it (assuming the view-all loads acceptably).
Ecommerce: faceted nav, variants, and the canonical maze
Ecommerce is where canonical tags earn their keep. A single product can exist at /shirts/blue-shirt, /shirts/blue-shirt?color=blue&size=m, and /category/mens/shirts/blue-shirt. Canonical all variants to the primary product URL. For color and size variants that share substantially the same product page, canonical the variants to the parent product unless each variant has unique content worth indexing separately.
Faceted navigation is trickier. Filters like price ranges, ratings, and availability typically generate URLs that should canonical back to the unfiltered category page. But filters that match real search demand (filtering by brand, by category combinations like "women's blue running shoes") often deserve their own indexable pages with self-referencing canonicals. Run a canonical checker across your top categories and you'll usually find a mix of correct, missing, and pointed-the-wrong-direction tags. Fixing them is one of the highest-ROI cleanup jobs on most ecommerce sites.