Make Your WordPress Site Citable by ChatGPT and Claude: The /llms.txt Playbook

Your WordPress site already has a robots.txt and a sitemap.xml. It also needs a /llms.txt. ChatGPT, Claude, and Perplexity have replaced a meaningful share of organic search traffic, and the way they decide which URL to cite is no longer “rank for the keyword.” It is “is this page on a curated index that tells the model what’s worth fetching.” That index is /llms.txt, and most WordPress sites are still serving a 404 there.

I run llms.txt across roughly 30 sites: the agency portfolio, our plugin store, a handful of WordPress products, my own blog. Different sites need different patterns. This post is the playbook I wish existed when I started, with the three production-tested approaches we use, when each one is the right call, and the four pitfalls that ate a full afternoon before I figured them out.

If you build community plugins, run a WooCommerce store, or publish a tech blog, at least one of these patterns applies to your site today.

What /llms.txt actually is, in one paragraph

The llms.txt spec defines a plain-text file at the root of your site that lists your important pages with one-sentence descriptions. The format is simple Markdown. The first line is the site’s H1. A short summary follows. Then ## Section headings group entries that look like - Title: one sentence about the page. AI assistants treat it like a curated, machine-readable table of contents. When a user asks ChatGPT “what’s a good self-hosted Discourse alternative,” the model’s retrieval layer prefers to cite URLs that show up in a relevant llms.txt rather than crawl your whole site.

It is not a ranking signal in the classical SEO sense. It’s a discoverability and citation hint. If you sell plugins, are mentioned in AI conversations, or want product comparisons to include you, it matters. If you run a personal blog with 12 readers, it doesn’t.

The mistake I made first was assuming I could ignore it because I already had a strong sitemap. The sitemap lists every URL. llms.txt curates the ones worth citing. AI models heavily prefer the curated list because it’s cheaper for them to reason over and because it implies someone has already done the work of deciding which pages are worth surfacing.

How AI assistants actually use llms.txt

Spend a few minutes watching what ChatGPT or Claude do when you ask them about a tool category. They run a web search. They get a list of URLs. They open three or four. They synthesise a recommendation.

The decision of which URLs to open is partially RAG-driven and partially heuristic. Claude.ai and ChatGPT both check for /llms.txt and /llms-full.txt at the root of any domain that shows up in their initial search results. If the file exists and lists URLs that match the query, those entries get prioritised for the actual content fetch. The model doesn’t have to guess from a sitemap with 7,000 entries; it gets a 100-line markdown file that a human curated.

This is why coverage matters. If your llms.txt lists 5 of your 100 plugins because Yoast curates aggressively, the model has a 5% chance of citing the right one. If it lists all 100 with rich descriptions, the model picks the actual best match for the query. We’ve watched this play out in production: queries like “self-hosted bbPress alternative” started returning our bbPress Alternative within a week of shipping the full-catalogue Worker overlay. Before, the same query returned Discourse, Flarum, and a Reddit thread.

The three patterns we use across our portfolio

There is no single “install plugin X, done” answer. The right pattern depends on what your site actually is.

Pattern 1: Pure blog or content site (use Rank Math or Yoast and bump the cap)

For a WordPress blog with no products, both Rank Math 3.0.55+ and Yoast 27.5+ generate /llms.txt automatically. Update either plugin and it just works.

The catch: both ship with low default caps. Yoast lists about 5 pages and 5 posts. Rank Math defaults to 50 posts. For a site with 488 posts (mine, vapvarun.com), the default 50 means about 10% of evergreen content is exposed.

Rank Math gives you a setting in the dashboard for the maximum-items cap. Bumping it from 50 to 500 took 30 seconds and changed coverage from 10% to ~100%. The file went from 24 KB to 234 KB and now lists every published post.

Yoast doesn’t expose the cap as a setting yet, but you can add a tiny mu-plugin that hooks the filter (the filter name has changed across versions, and the cleanest path is to read the plugin source on your own server to find the current name). For a site with serious content depth, switching to Rank Math is the smaller intervention.

If you’ve been writing about WordPress AI integrations or Gutenberg in the AI era, the cap matters. Those evergreen tutorials are exactly what AI assistants want to surface. Capping at 50 means everything older than two months never gets cited.

One more consideration with Rank Math: the order in which it lists posts. By default it sorts by published date descending. That’s fine for most sites, but if you have a few high-converting evergreen posts that should always rank first, the plugin doesn’t currently expose a way to pin them. We’re considering a small mu-plugin that re-orders the list to put pinned URLs at the top of the Posts section, but for most sites the default ordering is acceptable.

Pattern 2: WordPress with EDD or WooCommerce (Cloudflare Worker overlay)

The Yoast or Rank Math auto-generation pattern collapses on stores. A site with 100 products and Yoast’s 5-product cap is showing AI models 5% of its catalogue, and which 5 it picks is non-deterministic.

For our store at wbcomdesigns.com (~111 EDD downloads, 25 categories), here’s what we ship:

Yoast continues to generate the base /llms.txt. Keep its header, summary, posts list, pages list. That part is fine.
A Cloudflare Worker bound to the /llms.txt route intercepts every request before origin. The Worker fetches Yoast’s curated version, parses the structure, and replaces the truncated ## Downloads and ## Download Categories sections with the full list.
Descriptions for every product come from a pre-built JSON file. We run a Node script locally that scrapes the og:description and from each product URL once a week, writes the data to enrichment-data.json, and the deploy script inlines that JSON into the Worker source before upload. Zero runtime fetches, near-zero CPU, all 111 products listed with rich descriptions.

The result: the file went from 21 entries with 5 products listed to 363 entries with 107/111 products richly described, including pricing and stack tags. AI search citations on long-tail queries like “self-hosted Confluence alternative for WordPress” now actually return our pages, because we’re the only catalogue that gives the model a clean, indexed answer.

The Worker subrequest budget is the only constraint worth knowing. Cloudflare’s free tier caps at 50 subrequests per request. Doing 111 fetches at runtime breaks that limit immediately. Pre-computing the descriptions locally and inlining them as a JSON const sidesteps the limit entirely. The Worker becomes a pure lookup table.

The refresh workflow is two commands. We run node build-enrichment.mjs weekly to scrape current product copy and rebuild the JSON; this takes about 30 seconds for 130-ish URLs in parallel batches of 12. Then bash deploy.sh reads the JSON, inlines it as a const ENRICHMENT_DATA = {...} declaration in the Worker source, uploads via the Cloudflare API, and binds the route. The whole pipeline runs in under a minute. We trigger it manually for now; eventually we’ll wire it into a GitHub Action that runs on a Sunday cron.

Pattern 3: Static or headless site (build-time endpoint)

Our product store at store.wbcomdesigns.com is an Astro static site. There’s no WordPress to generate llms.txt from, and we don’t want a runtime worker for a static asset.

The cleanest pattern for any static-site generator is a build-time endpoint. In Astro, you create src/pages/llms.txt.ts with export const prerender = true. The endpoint reads from your content collection (whatever drives your product pages) and emits the formatted text file at build time. Every push to main regenerates it as part of the static site build. Zero runtime cost.

The same pattern works for Next.js (app/llms.txt/route.ts with export const dynamic = 'force-static'), Eleventy (a .11ty.js template), Hugo (a custom output format), or any framework that compiles your content to static pages.

What makes the static-site version interesting is what you can pull from. We pull the product tagline, description, top 4 features, pricing tier range, and tech-stack badges into each line. Each entry runs about 450 characters. Total file is 19 KB for 45 entries, all richly described. The extra context costs nothing because it’s already in the content collection; we just format it differently.

The four pitfalls that ate a full afternoon

Reading the spec is not enough. These were the actual failure modes.

1. Cloudflare caches your homepage HTML and leaks CSRF tokens

We learned this on a different project (a Laravel app that had a public form on the homepage), but it’s the same root cause. If your origin sends a @csrf token in a hidden form field and Cloudflare caches that HTML at the edge for guests, every guest within the cache TTL gets the same CSRF token. When they submit, the server rejects the mismatch and returns a 419 page expired.

For llms.txt specifically, the file is plain text and shouldn’t have this problem. But if you’re running a WordPress site that mixes forms with AI bot visibility, audit your edge cache rules first. Don’t cache HTML containing per-session tokens.

2. Bot Management runs before custom WAF rules

Cloudflare’s “AI Bots Protection” setting blocks ChatGPT, Claude, and Perplexity by default on Pro and higher plans. Even if you’ve added a custom WAF rule that explicitly allows their UAs, the managed bot block fires earlier in the request pipeline. Your WAF allow rule never gets a chance.

The fix: set aibotsprotection to disabled in Bot Management, then use a custom WAF rule for the granular allow list. The managed setting is too blunt to share a zone with a real allow policy. We hit this on every Wbcom property; the dashboard role permissions are fine, but the request-flow ordering is what controls the actual outcome, and that’s not obvious from the UI.

3. Em-dashes read as AI-generated

The first version of our llms.txt had em-dashes in the descriptions because the source product copy used them. AI models (ironically) are trained to flag em-dashes as a signal of AI-generated content. When the file you’re trying to use to attract AI citations is itself flagged as AI-written, the citation rate drops.

We strip every em-dash and en-dash to a hyphen before output. It’s three lines of code and worth running on every other piece of marketing copy you publish, too. The same applies to smart quotes and HTML entities (& and friends): clean them at output time, not at editorial time, so authors don’t have to think about it.

4. Worker route patterns and query strings

A Cloudflare Worker route bound to wbcomdesigns.com/llms.txt matches that exact path. If you hit /llms.txt?cb=12345 to bypass cache, the route may not match in some edge cases and the request falls through to origin. We saw this in testing and reproduced it three times before realising the route binding was strict on path alone.

The reliable cache bust is to bump a versioned cache key inside the Worker (/__worker/llms.txt?v=N against caches.default) instead of adding query strings to the public URL. Bonus: that pattern keeps the public URL clean for the AI bots that won’t add query strings.

5. WordPress rewrite-rule order eats your /llms.txt

This one is WordPress-specific and infuriating. If your theme or another plugin registers a catch-all rewrite that handles 404 fallbacks before Yoast’s llms.txt handler runs, requests to /llms.txt get served the homepage HTML instead of the file. The signal: Content-Type: text/html and a status of 200 with your homepage body. Verifies as “everything looks fine” to a casual eye but is broken for AI bots.

Two fixes. First, ensure Yoast (or whatever generates llms.txt) hooks parserequest at a high priority. Yoast 27.5+ does this by default; older versions don’t. Second, audit your theme’s functions.php for addrewriterule or templateredirect handlers that swallow unknown paths. We found one on a client site that was rewriting any 404 to the homepage for “SEO juice”, which had been silently breaking a dozen plugin-generated routes for years.

If you can’t fix the rewrite-rule order, drop in a small mu-plugin that intercepts /llms.txt at priority 1 and serves your own version. mu-plugins load before regular plugins and themes, so they win the priority race even when the rest of the site is misbehaving. The reference implementation we use across our agency portfolio handles posts, pages, and EDD downloads in one file; deploying it is a single SCP into wp-content/mu-plugins/ and a transient cache flush.

The platform underneath matters less than the wiring you put on top of it. WordPress is fine. Yoast and Rank Math are fine. The wiring that exposes your catalogue to AI assistants is the missing piece. — *The platform underneath matters less than the wiring you put on top of it. The wiring that exposes your catalogue to AI assistants is the missing piece.*

How to verify your /llms.txt is actually working

Run this against your own site. The four checks below catch every common failure mode:

URL=https://your-site.com/llms.txt

# 1. HTTP basics: should be 200, text/plain, with reasonable cache headers
curl -sI -A "Mozilla/5.0 (compatible; ClaudeBot/1.0)" "$URL"

# 2. Spec shape: first line should be "# Site Name"
curl -s "$URL" | head -20

# 3. Quality flags: these should all be zero
curl -s "$URL" | grep -cP '\x{2014}'                              # em-dashes
curl -s "$URL" | grep -cE '&(amp|quot|hellip|#[0-9]+);'           # unescaped HTML entities
curl -s "$URL" | grep -cP "[\x{2018}\x{2019}\x{201C}\x{201D}]"    # smart quotes

# 4. Bot accessibility: every AI assistant UA should get 200
for UA in "ChatGPT-User/1.0" "Claude-User/1.0" "Claude-SearchBot/1.0" "GPTBot/1.0" "Perplexity-User/1.0"; do
  STATUS=$(curl -s -o /dev/null -A "$UA" -w "%{http_code}" "$URL")
  printf "  %-25s -> %s\n" "$UA" "$STATUS"
done

If any of those return something unexpected, fix it before celebrating. A perfect llms.txt that returns 403 to GPTBot is an llms.txt that isn’t doing its job. The most common cause of the 403 is what I called pitfall #2: the Bot Management AI block firing before the custom WAF allow rule.

What it looks like when this actually works

A few weeks after we shipped these changes across our portfolio, we started seeing wbcomdesigns.com URLs show up in ChatGPT search results for queries like “self-hosted bbPress alternative.” Before the worker overlay, those queries surfaced Discourse, Flarum, and a Reddit thread. After, our Jetonomy and bbPress Alternative pages started getting cited directly.

The cause is simple: when an AI model has a curated, well-described list of 111 plugins to choose from, it picks the relevant one. When all it has is your homepage and a generic 5-item curation, it falls back to whatever Wikipedia and Reddit told it.

This is the same pattern I described in the WordPress CRM bridge problem. The platform underneath matters less than the wiring you put on top of it. WordPress is fine. Yoast and Rank Math are fine. The wiring that exposes your catalogue to AI assistants is the missing piece.

We’re tracking this with two cheap measurements. First, server logs filtered by ChatGPT, Claude, and Perplexity user-agents (we set up a small awk script that runs on the Cloudways access log nightly). Second, occasional manual spot-checks where we ask ChatGPT a long-tail query and see if our pages come up. Neither is rigorous, but together they paint a clear picture: traffic from AI-assistant UAs has roughly tripled since we shipped the full-catalogue Worker overlay. It’s not yet enough volume to compare to organic search, but the trajectory is unambiguous.

The TL;DR for someone shipping this next week

Blog or content-only WordPress site: install or update Rank Math (or Yoast 27.5+). Bump the post cap to 500 if Rank Math, or accept the 5-item Yoast curation if your post depth is shallow. Verify with the four-check script.
WordPress with EDD or WooCommerce: keep Yoast as the base. Run a Cloudflare Worker on the /llms.txt route that intercepts and overlays the truncated sections with a pre-built JSON of all your products. Refresh the JSON weekly with a local Node script that scrapes meta descriptions.
Static site (Astro, Next, Eleventy, Hugo): add a build-time endpoint that emits the file from your content collection. No runtime cost, regenerates on every push.
Whichever pattern you pick: strip em-dashes, set Cache-Control: public, max-age=3600, return text/plain, and verify the file loads cleanly with each of the major AI bot UAs.

If you’re building anything around AI in WordPress, the AI agentic backlash, or the new economics of free AI plans, llms.txt is one of the cheaper distribution moves you can make this quarter.

Need help shipping llms.txt on your own site?

This is one of those infrastructure pieces that sounds simple in a blog post but breaks in five surprising ways the first time you ship it on a real site. Cache rules, Worker subrequest limits, Bot Management defaults, plugin-version mismatches, the WordPress rewrite-rule order. Two hours into your first attempt you’ll have most of it working and a 403 you can’t explain.

If your WordPress site, plugin store, or static landing page needs a production-grade llms.txt and the trial-and-error route isn’t worth the time, reach out. We’ve shipped this across 30+ sites in our agency portfolio and product store. The patterns above are the ones we keep iterating on, and we set them up on client sites in a single afternoon with the verification checklist already in place.

The AI search wave is happening with or without you on the index. Better to be the first plugin in your category that ChatGPT cites than the one that quietly disappears from the conversation. Two years ago this would have been a hypothetical; today the queries are landing, the citations are happening, and the sites that haven’t shipped a real llms.txt are becoming invisible by default. The good news is the fix is mostly weekend-sized: pick the right pattern from above, ship it, run the verification script, move on.