Insights
INSIGHT

How Do You Optimize a Next.js Website So AI Crawlers Can Read Your Content?

By Vigo Nordin, Co-Founder at SCALEBASEPublished March 30, 202610 min read

TL;DR

AI crawlers (GPTBot, ClaudeBot, PerplexityBot) don't execute JavaScript. Next.js sites with client-side rendering are invisible. Fix: server-render all content, schema, and metadata. 3 critical config changes.

Why are Next.js sites often invisible to AI crawlers?

Next.js defaults to client-side rendering for dynamic routes in many configurations, which means the HTML sent to crawlers contains an empty div and a bundle of JavaScript. AI crawlers — GPTBot, ClaudeBot, PerplexityBot — fetch this HTML, find no content, and move on. The page effectively does not exist in their index.

This differs from Googlebot, which runs a headless Chromium instance and executes JavaScript before indexing. Googlebot sees the fully rendered page. AI crawlers do not. A 2025 Vercel analysis found that 38% of Next.js sites deployed on their platform used at least one route that relied entirely on client-side data fetching.

There are three rendering modes in Next.js: Client-Side Rendering (CSR), Server-Side Rendering (SSR), and Static Site Generation (SSG). For AEO, only SSR and SSG produce HTML that AI crawlers can read. CSR pages are blank until JavaScript executes in a browser environment that AI crawlers do not provide.

You can test this immediately. Run: curl -s https://yoursite.com/your-page | head -100. If the output contains your article text, headings, and schema, the page is server-rendered. If it contains only a root div and script tags, AI crawlers see nothing.

Which AI crawlers execute JavaScript?

None of the major AI crawlers execute JavaScript. GPTBot, ClaudeBot, and PerplexityBot all fetch raw HTML only. Googlebot is the exception among search crawlers — it renders JavaScript. Bingbot has partial rendering capabilities but does not reliably execute complex client-side applications.

CrawlerExecutes JavaScriptOperatorNotes
GPTBotNoOpenAIFetches raw HTML only, respects robots.txt
ClaudeBotNoAnthropicFetches raw HTML only, respects robots.txt
PerplexityBotNoPerplexityFetches raw HTML only, high crawl frequency
GooglebotYesGoogleHeadless Chromium, full JS rendering with 5-second timeout
BingbotPartialMicrosoftLimited JS rendering, unreliable for SPAs

This table reflects publicly documented crawler behavior as of early 2026. The practical implication: if your Next.js page works for Googlebot, that tells you nothing about AI crawler visibility. You must test with a non-rendering fetch like curl.

How to move JSON-LD from client-side to server-rendered

In the App Router (Next.js 13+), place JSON-LD in a script tag returned directly from a Server Component. Do not use useEffect or any client-side hook. The script tag must appear in the component's JSX return, not in a dynamically loaded module. This guarantees it is included in the initial HTML response.

For the App Router, define your JSON-LD object in the page.tsx file and render it inside a script tag with type="application/ld+json" and dangerouslySetInnerHTML. Because Server Components run on the server by default, this JSON-LD will be present in the raw HTML. Do not add 'use client' to any component that renders schema.

In the Pages Router, use the Head component from next/head within getServerSideProps or getStaticProps pages. Build the JSON-LD object using data from the server-side function and inject it into Head. If you use getStaticProps, the schema is baked into the static HTML at build time — the most reliable approach for AI crawlers.

A common mistake is importing a SchemaComponent that uses useState or useEffect internally. Even if the parent page uses SSR, any component marked 'use client' renders on the client only for its dynamic parts. Move all schema generation into Server Components or into getServerSideProps/getStaticProps data flow.

For a complete reference of which schema types to implement, see schema markup for AEO.

How to verify your fixes are working

Verification requires two checks: confirming the content is in the raw HTML, and confirming schema is parseable. Both can be done from a terminal without any paid tools. These checks should run after every deployment.

  1. Fetch the page with curl: curl -s https://yoursite.com/page
  2. Check for content: pipe the output to grep and search for a known heading or paragraph from the page
  3. Check for schema: curl -s https://yoursite.com/page | grep 'application/ld+json'
  4. Validate the JSON-LD: extract the script block and run it through jsonlint or jq to confirm valid JSON
  5. Test with GPTBot user agent: curl -s -H 'User-Agent: GPTBot' https://yoursite.com/page — confirm you are not serving different content or blocking the crawler
  6. Run Google Rich Results Test on the URL — note this executes JS, so use it for schema validation only, not for SSR confirmation

In CI/CD pipelines, add a build step that fetches each critical page and asserts the presence of required schema types. A simple script checking for Organization, Article, and FAQPage in the HTML output catches regressions before they reach production. Teams using this approach report catching schema breakage in 14% of deployments that would otherwise go unnoticed.

Next.js AEO checklist

This checklist covers the 12 configuration items that determine whether a Next.js site is visible to AI crawlers. Each item is binary — it either works or it does not. A site missing any of the first four items is effectively invisible to AI search engines.

  1. All content pages use SSR (getServerSideProps) or SSG (getStaticProps / App Router Server Components) — no CSR-only pages
  2. JSON-LD schema is rendered server-side in every page's HTML, confirmed via curl
  3. Metadata (title, description, OG tags) is set via generateMetadata (App Router) or Head component with server-side data
  4. robots.txt allows GPTBot, ClaudeBot, PerplexityBot — check for accidental Disallow rules
  5. XML sitemap generated at /sitemap.xml with all canonical URLs, submitted to Google Search Console
  6. llms.txt file at /llms.txt providing site structure and content summary for AI systems
  7. Open Graph images generated server-side (next/og or static files), not client-rendered canvas elements
  8. ISR (Incremental Static Regeneration) configured with revalidate intervals under 24 hours for frequently updated content
  9. Canonical URLs set on every page to prevent duplicate content signals across routes
  10. Internal linking uses standard anchor tags, not JavaScript-only navigation (Next.js Link component is fine — it renders anchor tags)
  11. 404 and error pages return proper HTTP status codes, not soft 404s that serve 200 with error content
  12. Page load produces complete HTML within 3 seconds — AI crawlers may timeout on slow server responses

SCALEBASE runs this checklist as part of every AEO technical audit. For the llms.txt standard mentioned above, see what is llms.txt and why it matters for AI.

Frequently Asked Questions

Does the App Router handle SSR differently than the Pages Router?

Yes. In the App Router, all components are Server Components by default unless marked with 'use client'. This means content and schema render server-side automatically. In the Pages Router, you must explicitly use getServerSideProps or getStaticProps — without them, pages default to CSR. The App Router is generally safer for AEO because SSR is the default behavior.

Can I use ISR (Incremental Static Regeneration) for AEO?

Yes. ISR generates static HTML that is fully visible to AI crawlers. The page is pre-rendered at build time and regenerated at the interval you set with the revalidate property. For AEO, set revalidate to 3600 (1 hour) or less for content that changes frequently. The key advantage: ISR pages load faster than SSR pages because they are served from cache.

Does Vercel edge rendering work for AI crawlers?

Vercel Edge Functions and Edge Runtime can serve server-rendered HTML, so AI crawlers receive complete content. However, edge functions have a limited Node.js API surface — some database clients and libraries do not work at the edge. Test your edge-rendered pages with curl to confirm the output includes all content and schema before relying on edge rendering for AEO.

How do I test if GPTBot can see my schema?

Run: curl -s -H 'User-Agent: GPTBot' https://yoursite.com/page | grep 'application/ld+json'. If the JSON-LD block appears in the output, GPTBot can see it. Also check your server logs or CDN analytics for GPTBot requests and verify they receive 200 status codes, not 403 or redirects.

Should I create a separate HTML sitemap page for AI crawlers?

An HTML sitemap page with links to all content provides an additional discovery path for AI crawlers. Unlike XML sitemaps (which some AI crawlers may ignore), an HTML sitemap is just a regular page with links that any crawler follows. It is not required, but it costs nothing to implement and provides a fallback discovery mechanism. Link to it from your footer.

Vigo Nordin

Vigo Nordin

Co-Founder of SCALEBASE, a specialist AEO and SEO agency based in Mallorca, Spain. Focused on AI search optimization, entity building, and engineering citations across ChatGPT, Perplexity, and Google AI Overviews.

LinkedIn

Ready to apply this to your business?

Stop being invisible to AI. Start being the answer your customers find.