LLMs.txt Curator
LLMs.txt Curator helps site owners generate and curate a high-quality llms.txt file for AI assistants and retrieval systems, ensuring only relevant, well-described content is exposed to large language models.
It generates and maintains llms.txt and llms-full.txt — the emerging standard for telling AI systems (ChatGPT, Claude, Perplexity, Gemini, and others) which pages on your site matter most and what they contain.
Unlike auto-generators that dump every URL into a flat file, LLMs.txt Curator takes a curation-first approach. You choose the pages, organise them into sections, fill descriptions, override titles for AI, validate quality, and see exactly which AI bots are reading your file — all from a single interface.
What makes this different
Most llms.txt plugins treat the file as a static output. LLMs.txt Curator treats it as a living asset:
- Quality Score — every generated file shows your coverage percentage, which pages have descriptions, and which still need attention.
- Description Suggestions — one click fills all missing descriptions from a five-step fallback chain (SEO meta → excerpt → Open Graph → page content). Never overwrites what you’ve set manually.
- Change Detection — a banner appears in your admin when curated pages have been updated since the last generation, with a Regenerate Now button. Your llms.txt stays current without manual checking.
- Per-Page Title Override — set a different title for each page specifically for AI consumption, without touching your on-site SEO.
- Safety Mode — blocks generation if validation errors exist, preventing a broken file from going live. Includes an on-demand validator with plain-English results.
- AI Crawler Analytics — see which bots visited in the last 7 days, with a visual bar chart. OpenAI, Anthropic, Perplexity, Google — know who is actually reading your file.
Core features
- Drag-and-drop curation — reorder sections and pages visually; each section becomes an
##heading per the spec - llms-full.txt generation — the companion file with full Markdown content for each curated page
- Five SEO plugin integrations — Rank Math, Yoast SEO, All in One SEO, SEOPress, The SEO Framework
- Schema-aware descriptions — uses structured data before falling back through the description chain
- Scheduled regeneration — choose Instant (on publish), Daily, Weekly, or Manual
- WooCommerce support — SKU, price, stock, categories, dimensions; respects product visibility
- WordPress Multisite — network-activate across all sites; each site manages its own independent llms.txt; Network Admin overview with per-site and bulk regenerate
- Pre-built templates — Business, E-commerce, SaaS, Blog, Local Business
- Live preview — see your exact llms.txt output before saving
- Import / export — move your configuration between sites as JSON
- WP-CLI —
wp llms-txt regenerate,wp llms-txt status,wp llms-txt crawler-log - REST API —
POST /wp-json/llms-txt/v1/regenerate,GET /wp-json/llms-txt/v1/status - Atomic file writes — temp file → rename; no half-written files served to bots
- ETag/304 caching — proper HTTP headers for CDN revalidation
- Subdirectory / Bedrock support — correctly finds the site root on non-standard installs
- Robots.txt reference — automatically adds a spec-compliant comment
Description Suggestions in detail
When pages lack descriptions, AI systems get less context. The suggestion engine fills the gap automatically:
- Schema markup (Rank Math, custom
_schema_json) - SEO plugin meta description
- WordPress excerpt
- Open Graph description (
_og_description/og_description) - First 160 characters of post content
Already-set descriptions are never touched. Pages that can’t be filled automatically are listed for manual review.
Quality Score
Every generated llms.txt ends with a coverage report:
Quality Score: 94%
Pages included: 48
Pages with descriptions: 45
Pages missing descriptions: 3
This is visible to the AI systems reading your file, and to you in the Preview tab.
AI Crawler Analytics
Track 12 known bots: GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Meta-ExternalAgent, Bytespider, CCBot, Cohere, DeepSeek, Amazonbot.
The 7-day analytics card shows a visual bar chart of recent activity. All-time totals are kept separately. IP addresses are anonymised before storage — last octet zeroed for IPv4, last 80 bits for IPv6. No data leaves your server.
Safety Mode
Before generation, the validator checks:
- Pages missing or unpublished
- Duplicate URLs across sections
- Noindex conflicts
- Canonical mismatches
- Password-protected pages
- Thin content (< 100 words, no meta, no excerpt)
- File size > 50 KB
- Too many pages (> 80)
When Safety Mode is on, errors block generation and the results are shown immediately. Warnings are surfaced but don’t block. Everything is explained in plain English.
Scheduled regeneration
- Instant — regenerates ~30 seconds after any page is published or trashed (default)
- Daily — once per day, via WP-Cron recurring event
- Weekly — once per week
- Manual — only when you click Save & Generate
How to get started
- Install and activate. The plugin auto-scans your content and creates initial sections.
- Curate: drag, drop, add, remove. Aim for 20-60 pages that best represent your site.
- Click Generate Missing Descriptions to fill gaps automatically.
- Fix any warnings shown in the Safety Mode card.
- Set the update mode that suits your workflow.
- Save & Generate. Your files are live at
/llms.txtand/llms-full.txt. - Enable the AI Crawler Log to see which bots start visiting.
WooCommerce integration
When WooCommerce is active, the plugin automatically includes SKU, price, and stock status in llms.txt descriptions, and full product details in llms-full.txt. Products with “hidden” visibility are excluded, and you can optionally exclude out-of-stock products.
Developer hooks
llmscu_capabilityfilter — override the required capability (default:manage_options)llmscu_post_limitfilter — scanner post limit per type (default: 500)llmscu_full_word_limitfilter — per-page word cap inllms-full.txt(default: unlimited)llmscu_regeneratedaction — fires after each successful regeneration with the content string
