17 February 2026

Next.js Sitemap, Robots, and Crawlability Debugging Checklist

Hero image for 'Next.js Sitemap, Robots, and Crawlability Debugging Checklist.' Image by Photo Boards.

In Brief

Debug crawlability by reading the signals together: generated routes, sitemap.xml, robots.txt, status codes, redirects, canonicals, and noindex rules. A sitemap is only an output, not proof that the app can render the route or that crawlers are being invited to keep it in the index.

Crawlability problems in Next.js rarely announce themselves politely.

The site deploys. Pages load. The sitemap exists. The robots file exists. Everyone moves on.

Then Search Console starts showing missing URLs, excluded pages, duplicate canonicals, "discovered but not indexed", or routes that were never meant to be public. The technical problem may be one line in robots.txt, a stale sitemap generator, an environment‑specific host, a missing dynamic route, or a canonical helper that quietly normalises the wrong URL.

The fix starts with one principle: discovery surfaces must be generated from the same truth as the site itself. If routes, sitemap entries, canonical URLs, and robots rules are maintained separately, drift is inevitable.

Confirm Which Pages Should Exist

Before debugging the sitemap, establish the intended route set.

For a Next.js site, that may come from:

page files
App Router segments
generated static params
CMS entries
product or category data
redirect maps
service registries
valid‑route JSON files
legacy URL inventories

The sitemap should not be treated as the source of truth. It is an output. If the sitemap includes a page the app cannot render, or omits a page the app depends on, the problem is upstream.

For content‑heavy sites, I like having a generated route list that can be compared with the sitemap, navigation, redirects, and important internal links. The older article on generating `urllist.txt` from a sitemap is a small example of why plain route inventories are useful when debugging.

Check Sitemap Inclusion Against Canonical URLs

Every URL in the sitemap should be canonical, indexable, and useful.

Check for:

non‑production hosts
preview or branch URLs
HTTP URLs on an HTTPS site
URLs that redirect
URLs that return 404
URLs with noindex
filtered or parameterised URLs without a strategy
duplicate trailing slash variants
category or pagination URLs that should not be submitted
old routes that should now redirect

Google's sitemap documentation is clear that sitemaps help search engines discover URLs. They do not override poor canonical decisions, robots rules, or weak page quality.

For Next.js, sitemap generation often changes between Pages Router and App Router projects. App Router supports metadata files such as sitemap.ts and robots.ts, documented in the Next.js pages for sitemap.xml and robots.txt. Those tools are useful, but they still depend on correct data.

Check robots.txt for Environment Drift

Robots files often break because staging and production need different behaviour.

Common failures include:

production accidentally disallowed
staging accidentally allowed
sitemap URL pointing at the wrong host
old disallow rules blocking new routes
rules copied from a previous platform
wildcard rules blocking assets needed for rendering
robots rules relied on for pages that should use noindex

Remember the distinction: robots.txt controls crawling, not indexing by itself. If a URL is blocked from crawling but linked elsewhere, search engines may still know about it and may show limited information. If a page must not appear in search, use the correct noindex mechanism on a crawlable response.

Google's robots.txt guide is worth checking before using robots rules as a blunt exclusion tool.

Validate Dynamic Route Generation

Most serious Next.js sitemap problems involve dynamic routes.

The template exists, but the data source changes. A CMS entry is unpublished. A slug has been normalised differently. A route is excluded from static generation. A category page exists in navigation but never appears in the sitemap. A product page is generated, but only if it appears in a particular API response during build.

Check:

how dynamic slugs are fetched
whether draft or unpublished records are filtered correctly
whether locale, market, or brand dimensions are included
whether empty categories produce URLs
whether deleted CMS entries are removed
whether route generation fails silently
whether pagination URLs are complete
whether the sitemap and app use the same slug normalisation

This matters for migration work too. If a site has just moved from Gatsby, WordPress, Shopify, or a React SPA, the old URL estate needs to be compared with the new generated route set. Traffic dropped after a replatform covers the wider recovery process.

Check Canonical Helpers and Redirects Together

A page can appear crawlable whilst still sending conflicting signals.

For each affected URL, compare:

requested URL
final URL after redirects
canonical URL
Open Graph URL
sitemap URL
internal links pointing to it
alternate language URLs

These should form a coherent story. If the sitemap submits /services/example/, internal links point at /services/example, redirects add a slash, and the canonical uses a preview host, search engines have to resolve avoidable noise.

Next.js makes redirects straightforward in many cases, but centralising redirects does not guarantee quality. Redirects need intent mapping. A retired URL should point to the best replacement, not just a convenient parent page.

Inspect Rendered Pages, Not Just Files

Do not stop after opening /sitemap.xml and /robots.txt.

Open representative rendered pages and check:

title
meta description
canonical
robots meta
h1
primary content
internal links
structured data
pagination links
status code
response headers

This is especially important for pages that depend on CMS data or revalidation. A sitemap can contain a URL that looked valid at build time, but the rendered page may now return thin content, an error state, stale content, or a canonical to something else.

If CMS publishing or revalidation is involved, the related problem is covered in Next.js App Router cache tags and revalidation.

Use a Repeatable Crawlability Checklist

For every important template, check:

Is the URL intended to be public?
Does it return 200?
Does it redirect?
Is it in the sitemap?
Is it blocked by robots.txt?
Does it have noindex?
Does the canonical point to itself or the correct representative?
Is it linked internally?
Does rendered HTML contain useful content?
Does structured data match visible content?
Is the URL present in generated route data?
Is it present or absent in Search Console for the reason expected?

That list is deliberately plain. Most crawlability fixes are not clever. They are the result of comparing outputs that should agree and finding where they drifted.

Wrapping Up

In a Next.js site, crawlability depends on several generated and rendered surfaces agreeing with each other.

The sitemap, robots file, route generation, redirects, canonicals, internal links, and rendered page output all have to describe the same site. If they are maintained as separate bits of plumbing, they will eventually disagree.

The best fix is not a bigger sitemap. It is a route and discovery system that can be checked, regenerated, and trusted.

Key Takeaways

Treat the sitemap as an output, not the source of truth.
Compare generated routes, sitemap URLs, canonicals, redirects, and internal links.
Keep production and staging robots rules separate and deliberate.
Pay special attention to dynamic routes from CMSes and e‑commerce data.
Inspect rendered page output before deciding the sitemap is correct.
Use crawl evidence to find drift between the route set and discovery surfaces.

All articles

Next article
Find Peak Element: Binary Search Without a Fully Sorted Array.
16 February 2026
Find Peak Element: Binary Search Without a Fully Sorted Array
How to solve LeetCode Find Peak Element with binary search, and why local comparisons are enough even when the array is not sorted in the usual way.
Read article
Technical GEO for Websites: Entities, Structured Data, and Crawl Paths.
31 May 2026
Technical GEO for Websites: Entities, Structured Data, and Crawl Paths
Technical GEO for websites, covering indexing, renderability, entity clarity, structured data, and crawl paths without inventing an AI‑only markup layer.
Read article
Traffic Dropped After a Replatform: The Technical Checks I Run First.
21 May 2026
Traffic Dropped After a Replatform: The Technical Checks I Run First
Diagnose traffic drops after a redesign, migration, or replatform by checking route parity, rendered HTML, redirects, canonicals, sitemaps, and schema.
Read article
Automatically Generate Text Sitemaps in Gatsby.
10 May 2023
Automatically Generate Text Sitemaps in Gatsby
When it comes to text‑based sitemaps in Gatsby, gatsby‑plugin‑sitemap falls short. Fortunately, it is straightforward to implement using Node.js and GraphQL.
Read article
Why Your Next.js App Router Page is Stale: Cache Tags, Revalidation, and CMS Publishing.
26 June 2026
Why Your Next.js App Router Page is Stale: Cache Tags, Revalidation, and CMS Publishing
Debug stale Next.js App Router pages with cache tags, revalidation paths, CMS webhooks, preview freshness, dependency graphs, and simpler cache contracts.
Read article
Automatically Generate urllist.txt from sitemap.xml.
06 January 2020
Automatically Generate urllist.txt from sitemap.xml
Using PHP it is quick and easy to automatically generate your urllist.txt sitemap from your sitemap.xml file (for example, using gatsby‑plugin‑sitemap).
Read article
Valid Palindrome in JavaScript: Two Pointers and Normalisation.
06 June 2023
Valid Palindrome in JavaScript: Two Pointers and Normalisation
A simple solution to solve the Valid Palindrome problem in JavaScript using the two‑pointer approach in ES6 and TypeScript, checking alphanumeric characters.
Read article
Optimising Website Performance with HTML, CSS, and JavaScript.
25 November 2017
Optimising Website Performance with HTML, CSS, and JavaScript
Improve website performance with practical HTML, CSS and JavaScript checks, from assets and rendering to scripts, measurement and Core Web Vitals.
Read article
Using Container Queries in CSS.
02 June 2022
Using Container Queries in CSS
Learn how CSS container queries apply styles according to a parent container's size, with @container examples, fallbacks and current support context.
Read article
Client‑Side Rendering and Search Visibility.
11 March 2019
Client‑Side Rendering and Search Visibility
How client‑side rendering can affect search visibility, and what to check around content, links, metadata, routing, loading states, and fallbacks.
Read article
Ethical Web Development I: Privacy, Consent and Security.
25 May 2021
Ethical Web Development I: Privacy, Consent and Security
Why web developers are responsible for protecting user privacy, rejecting manipulative consent patterns and building secure, maintainable websites.
Read article
Agentic Systems Do Not Fix Weak Service Boundaries.
27 February 2026
Agentic Systems Do Not Fix Weak Service Boundaries
Agentic systems do not replace service design. They expose weak contracts, permissions, observability, retries, state ownership, and workflow boundaries.

Relevant Services

Looking for technical direction?

I support teams that need senior judgement on React, Next.js, headless CMS architecture, performance, migrations, and technical SEO.

Get in touch

Next.js Sitemap, Robots, and Crawlability Debugging Checklist

In Brief

Confirm Which Pages Should Exist

Check Sitemap Inclusion Against Canonical URLs

Check robots.txt for Environment Drift

Validate Dynamic Route Generation

Check Canonical Helpers and Redirects Together

Inspect Rendered Pages, Not Just Files

Use a Repeatable Crawlability Checklist

Wrapping Up

Key Takeaways

Find Peak Element: Binary Search Without a Fully Sorted Array

Technical GEO for Websites: Entities, Structured Data, and Crawl Paths

Traffic Dropped After a Replatform: The Technical Checks I Run First

Automatically Generate Text Sitemaps in Gatsby

Why Your Next.js App Router Page is Stale: Cache Tags, Revalidation, and CMS Publishing

Automatically Generate `urllist.txt` from `sitemap.xml`

Valid Palindrome in JavaScript: Two Pointers and Normalisation

Optimising Website Performance with HTML, CSS, and JavaScript

Using Container Queries in CSS

Client‑Side Rendering and Search Visibility

Ethical Web Development I: Privacy, Consent and Security

Agentic Systems Do Not Fix Weak Service Boundaries

Relevant Services

Technical SEO for JavaScript Applications

Next.js Sitemap, Robots, and Crawlability Debugging

WordPress to Next.js SEO Recovery

Technical SEO Recovery and Debugging

Looking for technical direction?