Next.js Site Broke After Deploy: A Production Triage Checklist

In Brief
Start by narrowing the failure before changing code. Compare local, preview, and production behaviour, then check deploy logs, environment variables, runtime errors, CMS data, redirects, middleware, auth loops, cache state, and recent content changes. The aim is to decide whether to roll back, patch forward, or keep investigating.
A broken Next.js deploy does not always look broken at first.
The homepage may load. The deployment may be green. A few routes may work. Then someone finds a product page throwing 500s, an auth loop, a missing CMS entry, a blank template, or a route that only fails in production.
The first job is not to be clever. It is to reduce uncertainty quickly.
You need to know whether to roll back, patch forward, disable a feature, fix data, or keep investigating. That decision depends on evidence: which routes fail, which users are affected, which release introduced it, and whether the failure is code, configuration, content, cache, or infrastructure.
Decide Whether This is an Incident
Before opening files, decide the level of response.
Ask:
- Is the site unavailable?
- Are important conversion paths broken?
- Are authenticated users locked out?
- Are payments, forms, booking, or enquiry flows affected?
- Is search traffic being sent to broken pages?
- Is the problem limited to one route family?
- Can the previous deployment be restored safely?
If the failure affects revenue, lead generation, security, or a large user group, treat it as an incident. That does not mean panic. It means working from a short, shared timeline and making rollback decisions deliberately.
A quiet Slack thread and five people trying fixes independently is not triage. It is how production problems become harder to explain later.
Capture the Failing Surface
Collect examples before changing anything.
For each affected page or flow, capture:
- URL
- status code
- environment
- browser console errors
- server log errors
- screenshot if useful
- whether the issue appears on refresh
- whether it appears in preview and production
- user role or auth state
- release or commit hash
Vercel's documentation on deployment logs is useful here, especially when the local app does not reproduce the issue. The important log line is often earlier than the visible failure.
If observability is already in place, use release markers, route labels, and structured logs. If it is not, this is the moment that proves why production observability for Next.js is not optional decoration.
Check the Deployment Diff
Start with the smallest possible change set.
What changed since the last good deploy?
- code
- environment variables
- CMS content
- redirects or rewrites
- middleware
- package versions
- Node version
- build configuration
- feature flags
- external API credentials
- cache or revalidation behaviour
It is common to blame the most visible code change and miss a configuration change made in the dashboard. Next.js production failures often come from the space between code and environment.
If the deploy moved routes, changed rendering mode, or introduced dynamic data fetching, inspect those routes first. If the deploy touched auth, middleware, or redirects, check loops and protected routes before digging into components.
Check Environment Variables and Secrets
Environment drift is one of the most common production‑only failure modes.
Check:
- missing variables
- variables set in preview but not production
- stale secrets
- wrong API base URLs
- different CMS environment names
- different auth callback URLs
- variables exposed to the browser that should not be
- variables expected in the browser but missing the public prefix
The local article on managing environment variables in Next.js covers the framework side. Vercel also documents environment variables from the platform side.
Do not paste secrets into chat or logs. Verify names, scopes, and environments without turning the incident into a security problem.
Separate Build‑Time Failures from Runtime Failures
A green deployment means the build completed. It does not mean every route works.
Split failures into:
- build‑time errors
- static generation errors
- runtime server errors
- client‑side hydration errors
- failed browser API calls
- auth or middleware loops
- cache and revalidation errors
- content data errors
If a route is statically generated, the problem may have happened during build and only become visible as a pre‑rendered error state. If a route renders at request time, check runtime logs and data dependencies. If the page loads and then breaks after hydration, inspect browser console errors and mismatched client state.
For build‑specific issues, debugging failing Next.js builds on Vercel is the better starting point.
Inspect Content and Data Contracts
CMS data can break a deploy without any code change.
Examples:
- missing required field
- unpublished linked entry
- image without dimensions
- invalid slug
- circular reference
- unexpected rich text node
- empty array where code expects one item
- deleted category or tag
- date format change
- HTML embedded where plain text was expected
The fix is not always "make the component defensive". Sometimes the right fix is editorial validation. Sometimes it is a schema migration. Sometimes it is a one‑off content correction. The triage decision should identify which layer owns the problem.
If the issue is stale content rather than broken content, check cache and revalidation before changing templates.
Check Redirects, Middleware, and Auth Loops
Redirects and middleware can break a site while making every individual line look reasonable.
Look for:
- redirect chains
- locale redirects looping
- auth middleware catching public assets
- preview routes blocked by auth
- trailing slash redirects conflicting with canonical rules
- old rewrites sending requests to removed routes
- middleware depending on unavailable cookies
- environment‑specific host checks
When a route works locally but fails on the deployed site, middleware and deployment host assumptions deserve early attention.
Choose Rollback or Patch Forward
Rollback when:
- the blast radius is large
- the old deployment is known good
- data has not migrated irreversibly
- the issue affects revenue or critical user journeys
- the fix is not obvious
Patch forward when:
- the issue is isolated
- the fix is low risk
- rollback would reintroduce a larger problem
- data or external state has moved on
- the deployment contains important unrelated fixes
Do not let pride decide. A rollback is not failure. It is a production control.
Wrapping Up
A broken Next.js deploy is easiest to fix when the team resists the urge to guess.
Capture failing URLs, identify the blast radius, compare the release with the last known good state, check environment drift, separate build and runtime failures, and inspect content contracts before changing code.
Most production issues are not mysterious. They are just spread across code, data, configuration, cache, and platform behaviour. Triage is the process of putting those pieces back into order.
Key Takeaways
- Decide quickly whether the failure is an incident.
- Capture URLs, logs, status codes, and release identifiers before changing anything.
- Check environment variables, secrets, redirects, middleware, and CMS data early.
- Separate build‑time, runtime, hydration, auth, and cache failures.
- Roll back when the blast radius is large and the previous deploy is safe.
- Patch forward only when the fix is clear and lower risk than rollback.