Loading...
Loading...
Publish a /llms.txt file at your domain root
The llms.txt proposal puts a curated, LLM-friendly map of your best content at /llms.txt so models can find what matters at inference time.
llms.txt GeneratorStart the file with a single H1 project/site name
The H1 title is the only strictly required element of the spec; without it parsers can't identify the document.
llms.txt ValidatorAdd a blockquote summary directly under the H1
A short blockquote gives the model the key context needed to understand the rest of the file before reading any links.
llms.txt ValidatorGroup links under H2 section headers (e.g. Docs, Guides, About)
H2-delimited file lists are how the spec organises links so an LLM can pick the right section for a query.
llms.txt ValidatorGive each link a descriptive title and a short note after the colon
The `[title](url): note` format lets the model judge relevance without fetching every page.
llms.txt ValidatorPut skippable links under an '## Optional' section
The spec reserves an Optional section for secondary content that can be dropped when a shorter context window is needed.
llms.txt ValidatorCurate to your highest-value pages, not your whole sitemap
llms.txt is a hand-picked guide; dumping every URL dilutes it and wastes the model's limited context.
llms.txt GeneratorPublish /llms-full.txt with full-text content in one file
llms-full.txt concatenates your actual content so a model can ingest everything in a single fetch (Anthropic, Cloudflare and Vercel ship one).
llms-full.txt GeneratorServe llms.txt as static text/plain with no redirects
If the file isn't returned as text/plain (and free of redirects/rewrites/CDN cache issues) it may not be processed correctly.
Keep llms.txt in sync with your live content
A stale llms.txt that links to moved or deleted pages sends LLMs to dead ends and erodes trust in the file.
llms.txt Diff ToolOffer a clean Markdown copy of pages at the same URL + .md
The spec recommends a `.md` (or `index.html.md`) twin of each page so models read pure content instead of parsing HTML.
Page-to-Markdown ExporterHave a valid robots.txt at the domain root
robots.txt is still the primary, widely-honoured signal AI crawlers check before fetching your pages.
Robots.txt GeneratorDecide deliberately whether each AI bot is allowed or blocked
Allowing training/search bots feeds AI answers; blocking them protects content — either way it should be a choice, not an accident.
AI Bot robots.txt CheckerSet rules for OpenAI's GPTBot, OAI-SearchBot and ChatGPT-User
OpenAI splits training (GPTBot), search indexing (OAI-SearchBot) and live user fetches (ChatGPT-User) — each needs an explicit directive.
AI Bot robots.txt GeneratorSet rules for ClaudeBot, Claude-SearchBot and Claude-User
Anthropic's three bots cover training, in-product search and user-initiated fetches, giving granular robots.txt control.
AI Bot robots.txt GeneratorConfigure Google-Extended separately from Googlebot
Google-Extended controls Gemini/AI-Overviews training without affecting normal Google Search indexing.
AI Bot robots.txt GeneratorCover PerplexityBot, CCBot, Amazonbot, Applebot-Extended, Meta-ExternalAgent
These bots feed Perplexity, Common Crawl, Alexa, Apple Intelligence and Meta AI — the rest of the AI citation surface.
AI Bot robots.txt GeneratorConfirm you aren't blanket-blocking all bots with a wildcard Disallow
A stray `User-agent: * / Disallow: /` makes your whole site invisible to LLMs you actually want to reach you.
Robots.txt TesterTest that key URLs are actually crawlable for AI user-agents
Path-level rules interact in non-obvious ways; testing real URLs against each bot confirms intent matches reality.
Robots.txt URL SimulatorReference your XML sitemap from robots.txt
A sitemap directive helps every crawler — AI included — discover your full canonical URL set.
XML Sitemap ValidatorEnsure llms.txt and robots.txt don't contradict each other
Pointing LLMs to a page in llms.txt while blocking it in robots.txt sends a confused, self-defeating signal.
llms.txt vs robots.txt Consistency CheckerServe critical content in static server-rendered HTML
Analysis of 500M+ GPTBot fetches found zero JavaScript execution — AI crawlers read raw HTML and never wait for rendering.
Don't hide primary content behind client-side JavaScript
Content injected by JS in SPAs is completely invisible to GPTBot, ClaudeBot and PerplexityBot.
Page-to-Markdown ExporterVerify each page converts cleanly to Markdown
If a page collapses into clean Markdown, an LLM can ingest it faithfully; if it turns to noise, content is being lost.
Page-to-Markdown ExporterUse semantic HTML (heading hierarchy, lists, tables, articles)
Header-based structure lets retrieval systems split content along real topic boundaries instead of guessing.
Semantic Structure AnalyzerKeep a logical, sequential heading order (no skipped levels)
A clean H1→H2→H3 outline gives chunkers a reliable map of your document's structure.
Semantic Structure AnalyzerMaintain a high text-to-HTML ratio (content over markup bloat)
Heavy markup, inline scripts and tracking code bury the actual text models care about and waste their context.
Text-to-HTML Ratio CheckerUse descriptive link text instead of 'click here'
Link anchors are strong context signals; meaningful anchors help models understand where a link leads.
Semantic Structure AnalyzerGive images and diagrams meaningful alt/caption text
Text-only crawlers can't see images; alt text and captions are the only way that information reaches an LLM.
Set canonical tags so models ingest one authoritative version
Duplicate URLs split signals and risk an AI citing a parameterised or stale copy of your page.
Add JSON-LD structured data for your key entities
JSON-LD is the preferred 2026 format and is increasingly leveraged by ChatGPT, Perplexity and Google AI Overviews.
Schema Markup TesterFill out schema fully (not just the minimum required fields)
Richer, complete markup gives machines an unambiguous description of who and what you are.
Schema Completeness ScorerDefine an Organization entity with a stable @id
A persistent @id lets you reference the same entity across pages and tie everything back to one identity.
Schema Completeness ScorerLink entities to Wikidata/Wikipedia via sameAs
sameAs Q-IDs anchor your brand in the canonical knowledge base behind Google, ChatGPT, Claude and Perplexity — 'non-negotiable' for LLM search.
Wikidata Entity Presence CheckerMake sure your brand exists as a Wikidata entity
If there's no Q-ID to point to, models can't disambiguate your brand from similarly-named ones.
Wikidata Entity Presence CheckerCover the entities and topics LLMs expect for your niche
Filling entity gaps signals topical authority that retrieval and answer engines reward.
Entity Coverage Gap AnalyzerWrite self-contained passages that make sense out of context
Retrieved chunks often start mid-argument; self-contained passages stop the model from hedging or hallucinating.
Passage Chunk AnalyzerKeep sections roughly chunk-sized (~300–500 words)
Recursive ~512-token chunks topped a Feb-2026 benchmark; sections near that size retrieve cleanly without splitting mid-thought.
Passage Chunk AnalyzerKeep one topic per section under a clear heading
Header-based 'by title' chunking keeps each topic in its own retrievable unit, boosting precision.
Passage Chunk AnalyzerDefine key terms in plain, standalone sentences
Clean definition blocks are easy for models to extract and quote verbatim as answers.
Definition Block DetectorFront-load the direct answer at the top of each section
Answer-first writing means the most quotable sentence sits where retrieval and summarisation grab it.
Passage Chunk AnalyzerBack claims with statistics and concrete data
Specific numbers are highly citable and signal substance LLMs prefer to quote.
Statistic & Citation Density ScorerUse explicit question-and-answer formatting where natural
Q&A pairs map directly onto user prompts, making your content an easy retrieval match.
Semantic Structure AnalyzerMeasure an overall LLM-optimization score and track it
A single score turns the dozens of signals here into a number you can watch improve over time.
LLMO Score AnalyzerDeclare your AI usage policy (e.g. an ai.txt file)
A machine-readable policy file states up front how AI systems may use your content, beyond simple allow/block.
ai.txt GeneratorReserve text-and-data-mining rights via TDMRep (tdmrep.json / headers / meta)
TDMRep is referenced by the EU AI Act and CDSM Article 4 as a machine-readable rights reservation with legal teeth in the EU.
Consider an RSL license to set machine-readable terms (attribution / pay-per-crawl)
RSL 1.0 became an industry standard in 2025 (Reddit, Yahoo, Quora, Medium) for declaring AI usage terms and compensation.
Distinguish AI-training vs AI-search vs indexing in your policy
RSL's ai-all / ai-input / ai-index categories let you allow search citation while opting out of training.
Keep policy signals consistent across robots.txt, headers and meta tags
Different crawlers check different signals, so the same intent should appear in every place a bot might look.
Verify which AI crawlers can actually reach your site
Confirming real reachability catches firewall, WAF or CDN rules that silently block bots your robots.txt allows.
AI Crawler Accessibility CheckerMonitor server logs for AI bot visits (GPTBot, ClaudeBot, PerplexityBot)
Logs are the only proof of whether AI crawlers are fetching your content and how often.
AI Bot Crawl Log ParserVerify real bots by reverse DNS / published IP ranges
User-agent strings are easily spoofed; legitimate GPTBot resolves to OpenAI infrastructure and publishes its IP ranges.
AI Bot Crawl Log ParserConfirm llms.txt and llms-full.txt return 200 (not 404/redirect)
A broken or redirected file means none of your llms.txt work is reaching models at all.
llms.txt ValidatorDon't treat llms.txt as an access-control mechanism
No major AI vendor enforces llms.txt as permission; use robots.txt, headers and licensing for control.
The DarnItSEO Audit runs 75+ of these checks across your whole site.