Technical SEO

robots.txt and llms.txt: Controlling Bot Access

7 min readJanuary 10, 2026

The robots.txt file is a text file at your domain root that tells search engine crawlers which pages they can and cannot access. The newer llms.txt file serves a similar purpose for AI language model crawlers.

robots.txt Basics

Place robots.txt at your domain root (e.g., https://example.com/robots.txt). It uses a simple syntax:

# Allow all crawlers access to everything
User-agent: *
Allow: /

# Block all crawlers from admin pages
User-agent: *
Disallow: /admin/
Disallow: /private/

# Sitemap location
Sitemap: https://example.com/sitemap.xml

Key robots.txt Rules

  • User-agent — Specifies which crawler the rules apply to (* means all)
  • Disallow — Blocks the specified path from being crawled
  • Allow — Explicitly allows a path (useful for overriding broader Disallow rules)
  • Sitemap — Points to your XML sitemap location

Common Crawlers

CrawlerUser-agentOwner
GooglebotGooglebotGoogle
BingbotBingbotMicrosoft
GPTBotGPTBotOpenAI
ClaudeBotClaudeBotAnthropic
Google AIGoogle-ExtendedGoogle (AI training)
PerplexityBotPerplexityBotPerplexity

Controlling AI Bot Access

You can selectively allow or block AI crawlers:

# Allow search engines, block AI training
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

# Allow Perplexity (real-time citation)
User-agent: PerplexityBot
Allow: /

The llms.txt Standard

llms.txt is an emerging standard (proposed in 2024) that provides structured information to AI assistants about your site. Unlike robots.txt which controls access, llms.txt describes your site's content and policies.

# llms.txt
# Site: example.com
# Purpose: Help AI assistants understand our content

## About
We are an SEO tools company providing free
website analysis and optimization guides.

## Key Pages
- / : Homepage with SEO analyzer tool
- /blog/ : SEO tutorials and guides
- /dashboard/ : SEO analysis dashboard
- /schema/ : Schema markup generator

## Content Policies
- AI citation: Encouraged with link attribution
- Content scraping: Not permitted
- Training data: Opt-out (see robots.txt)

## Contact
- Website: https://example.com
- Support: support@example.com

Common robots.txt Mistakes

  • Blocking CSS/JS — Don't block stylesheets or scripts; Google needs them to render your pages
  • Blocking the entire site — A stray Disallow: / under User-agent: * will de-index your entire site
  • Using for security — robots.txt is publicly readable and is NOT a security measure. Use proper authentication instead.
  • Forgetting the trailing slashDisallow: /admin blocks /admin-page too. Use Disallow: /admin/ to block only the directory.

Testing robots.txt

DarnItSEO checks your robots.txt configuration as part of its technical SEO analysis. You can also use Google Search Console's robots.txt tester to verify your rules.

Back to all articles

Put this into practice

Run a free SEO audit on your site and see how you score.

Try DarnItSEO Free