Technical SEO
Canonical Tag
Last reviewed
A canonical tag is an HTML link element placed in the `<head>` of a webpage — `<link rel="canonical" href="https://example.com/page" />` — that tells search engines which URL is the authoritative version when duplicate or near-duplicate content exists across multiple URLs. It was introduced jointly by Google, Microsoft, and Yahoo in 2009 to give site owners a way to consolidate ranking signals.
Canonicals are most often needed when the same content is reachable via multiple URLs: with and without `www`, with `http://` and `https://`, with and without trailing slashes, with tracking parameters (`?utm_source=...`), or with case variations. A canonical tag declares the preferred URL, and search engines fold the ranking signals from the variants into that single URL.
A "self-referencing canonical" is a canonical tag that points to the same URL the page lives at. This is the recommended default for every indexable page — it explicitly states "this URL is the canonical version of itself" and prevents accidental misinterpretation when the page is fetched via a slightly different URL form.
In Next.js (App Router), canonicals are set via the `alternates.canonical` field in the `metadata` or `generateMetadata` export. Inheritance is per-route, so a child page's canonical replaces the parent's. The single most common bug in Next.js sites is a root-layout canonical pointing to `/` that propagates to every unoverridden child page, making the entire site self-canonicalize to the homepage.
Validation tools include Screaming Frog's SEO Spider (detects canonical chains, conflicting tags, and non-self-referencing canonicals on key pages), Google Search Console's URL Inspection tool (shows which URL Google chose as canonical, which may differ from the declared canonical), and Ahrefs/SEMrush site audits.
Why it matters in GEO / AI search
In GEO and AI search, canonicals matter more than in traditional SEO because AI engines use the canonical URL as the entity identifier for the page. When ChatGPT cites a source, the URL it surfaces is the canonical one — not whichever variant happened to be retrieved. A misconfigured canonical means the cited URL doesn't match the page users land on, hurting both attribution and downstream click-through.
For sites built on Next.js, React, or other frameworks with a root layout, the single highest-leverage canonical check is: does every page have a self-referencing canonical, or do they all inherit a root canonical? Run `curl -sL https://example.com/some-deep-page | grep canonical` on five random URLs. If they all return the homepage URL, every page on the site is silently telling Google "I am the homepage" — a brand-new-site-killer error.
Canonicals do not prevent indexing; they consolidate signals. If you want a page out of the index entirely, use `noindex`, not canonical. The two interact in counterintuitive ways: a canonical pointing to a `noindex` page can suppress the canonical target itself. Cross-check both for any URL you want to remove or consolidate.
Examples
Self-referencing canonical (the right default)
Every indexable page emits `<link rel="canonical" href="<own absolute URL>" />`. In Next.js: `alternates: { canonical: 'https://example.com/the-page' }` in the route's generateMetadata.
Parameter consolidation
`example.com/products?utm_source=email` and `example.com/products` both carry `<link rel="canonical" href="https://example.com/products" />`. Tracking traffic and clean URL signals merge into the canonical version.
Cross-domain canonical
A guest post syndicated to a partner blog can canonical back to the original on your domain. Use sparingly — Google sometimes ignores cross-domain canonicals and picks its own.
Anti-pattern: site-wide root canonical inheritance
Root layout has `canonical: 'https://example.com/'` with no per-page override. Every page inherits and self-canonicalizes to the homepage. Result: the site appears to be a one-page site to Google and AI engines.
Authority Links
Related Terms
Content
Duplicate Content
Identical or fairly similar pieces of content on the same or different websites are called duplicate content.
Technical SEO
Hreflang Tag
Refers to tags for specifying the language and geographical targeting of a webpage.
Technical SEO
Noindex Tag
Refers to tags for pages that are not to be indexed by bots.
Technical SEO
Redirect Chain
Refers to cases where websites are redirected to other sites more than once.

