llms.txt
Quick facts
- What it is
- A proposed publishing convention (Jeremy Howard / Answer.AI, Sep 2024): one curated markdown file at /llms.txt pointing an LLM at the clean pages to read first. A wayfinding hint, not an access or discovery file
- The load-bearing distinction
- Supply-side adoption is real and rising (auto-generated by docs platforms; ~10% of sampled domains). Demand-side consumption is unconfirmed — no major AI vendor documents reading it
- Officially consumed by AI vendors?
- No public confirmation, as of 2026-05. OpenAI / Anthropic / Google / Perplexity crawler docs are silent on llms.txt; Google's John Mueller likened it to the keywords meta tag (Reddit, 2025-04)
- Industry-standard?
- No. A proposed convention, not an IETF/W3C-ratified standard; governance is community-led — see the llms.txt Working Group
- Where it sits
- An upstream legibility/retrievability aid, not citability. It is GEO insurance — cheap and forward-compatible — not a GEO channel that moves citations today
1. What llms.txt is — a publishing convention, not an access control
llms.txt is a single markdown file at site root (/llms.txt), proposed by Jeremy Howard at Answer.AI in September 2024 (see the original proposal).
Definition (GEO Wiki working definition): llms.txt is a proposed publishing convention — one curated, clean-markdown file that points an LLM at which pages to read first and where the clean text lives. It solves a legibility problem (HTML is noisy, context windows are finite), not an access problem and not a discovery problem.
The proposal’s own framing: “a markdown file that provides brief background information and guidance, along with links to markdown files providing more detailed information” (see Answer.AI).
This entry is the standard plus its honest status — the hub. The operational layers are routed out and held throughout:
| This entry owns | Routed to |
|---|---|
| The format, and the honest supply-vs-demand status | — (this entry) |
| The doing — generate it, keep it fresh, per CMS/stack | Deploying llms.txt |
| The spec governance and who is adopting | llms.txt Working Group |
| The access policy this file is not | robots.txt |
| The crawler layer this is one file within | AI Crawlers |
Standard here, operation there — the same split Schema.org for AI draws against its playbooks. The GEO hub folds llms.txt under crawl/legibility plumbing rather than treating it as its own lever; this page is where that compression is unpacked honestly.
2. The load-bearing distinction — supply-side ≠ demand-side
This is the single highest-value disambiguation on the page, and this entry’s counterpart to Schema.org for AI §2’s “markup is not a signal.” Collapsing supply (sites publishing the file) into demand (AI engines consuming it) is where the false hope comes from.
| Supply side | Demand side | |
|---|---|---|
| What it means | Sites publish /llms.txt | AI engines fetch and use it at crawl/inference time |
| Status | Real and rising — auto-generated by docs platforms; a 300,000-domain study measured ~10% adoption (Search Engine Journal, 2025-11-20) | Unconfirmed — no major vendor documents reading third-party llms.txt |
| Evidence | Mintlify auto-hosts it; Anthropic, Google and Perplexity each publish one for their own docs | OpenAI (bots docs), Anthropic, Perplexity and Google crawler docs are silent on llms.txt |
The trap inside the evidence: Anthropic publishes an llms.txt for its own developer docs, yet ClaudeBot’s published crawler documentation does not commit to consuming one. Publishing ≠ consuming. The same holds for the GPTBot and PerplexityBot docs — silent on llms.txt.
The load-bearing line, stated plainly: llms.txt today is a low-cost, forward-compatible bet on a convention — not a citation channel you can rely on. The cost of publishing is ≈ 0; the confirmed payoff is not “AI vendors read it,” it is “any opt-in consumer reads a clean curated map cheaply now.”
The cautionary anchor, carried as the principle, not a verdict: Google Search Advocate John Mueller publicly compared llms.txt to the old keywords meta tag — “none of the AI services have said they’re using LLMs.TXT (and you can tell when you look at your server logs that they don’t even check for it)” (Reddit, April 2025, reported by Search Engine Journal). The skepticism is real and represented here; §7 sizes the narrower case it does not refute.
3. The canonical file format — the site-quoted asset
This is the entry’s load-bearing asset: the spec the rest of the site quotes. The standard is owned here; per-stack generation is routed to Deploying llms.txt. Per llmstxt.org, the structure is:
# Project Name
> A short blockquote summarising the project — the key info needed
> to make sense of the rest of the file.
Zero or more free prose paragraphs (no headings) for extra context.
## Docs
- [Quickstart](https://example.com/quickstart.md): how to get started
- [API reference](https://example.com/api.md): the full endpoint list
## Optional
- [Changelog](https://example.com/changelog.md): can be skipped if the
context window is tight
| Element | Required? | What it is for |
|---|---|---|
# H1 project name | Yes — the only required element | Names the entity the file describes |
> blockquote summary | Recommended | One-line gist; key info to read the rest |
| Free prose paragraphs | Optional | Extra context, no headings |
## H2 link-list sections | Optional, repeatable | Curated link lists, each [name](url): note |
## Optional section | Optional | Lower-priority links an LLM may skip under token pressure (llmstxt.org) |
llms.txt vs llms-full.txt — name the distinction precisely, because the popular name is not the spec’s. The official proposal defines processed expansions llms-ctx.txt and llms-ctx-full.txt, generated from llms.txt by the llms_txt2ctx tool (see Answer.AI). The widely deployed file literally named llms-full.txt — the full text of all docs concatenated into one file — is a Mintlify-popularized de-facto convention, not the original spec (Mintlify, 2024-11-20). The mental model that survives both: llms.txt = a curated index (wayfinding); the full variant = a drop-in content payload (and a context-window risk — see §6).
4. llms.txt vs robots.txt vs sitemap.xml — three files, three jobs
The single most common category error is conflating these three root files. They do not overlap; none substitutes for another.
| File | Its job | What it does not do |
|---|---|---|
robots.txt | Access control — may a bot fetch this path (see robots.txt) | Does not curate, render, or rank; it is a request, not enforcement |
sitemap.xml | Discovery / completeness — here is everything, for indexing (Sitemap & IndexNow) | Does not curate or grant access; not a “best of” list |
llms.txt | Curation + clean rendering — read these first, in clean markdown | Does not grant/deny access, claim completeness, or act as a ranking signal |
The load-bearing line: llms.txt is not “a sitemap for AI” (that conflates curation with completeness) and not “robots.txt for AI” (that conflates a reading hint with an access rule). The access decision is the job of robots.txt and the broader AI Crawlers layer; llms.txt never grants or blocks a fetch.
5. What llms.txt does and does not do — the bounded reading
The honesty section, carried with the same discipline as Citability §5 and Schema.org for AI §6.
| llms.txt does | llms.txt does not |
|---|---|
| Offer a clean, token-cheap, curated entry point | Grant or deny crawler access (≠ robots.txt) |
| Help opt-in consumers you control read the right pages cheaply today | Block training, or guarantee crawling, indexing, or citation |
| Position you for ~0 cost if vendor consumption arrives | Get read by browsers or end users |
| Describe site structure for any tool that chooses to honor it | Act as a ranking or citation signal |
| What’s true | The bounded reading |
|---|---|
| Adoption is rising; docs platforms auto-generate it | That is supply, not demand — publishing is not consumption |
| Anthropic, Google and Perplexity publish an llms.txt | They host one for their own docs; their crawlers are not documented to read yours |
| A 300k-domain study measured ~10% adoption | Real momentum, no measured citation effect — “not yet” (SEJ, 2025-11-20) |
| A 90-day, 10-site study tracked AI traffic before/after | Recommended treating it as infrastructure like sitemaps, not a growth strategy (Search Engine Land, 2026-01-20) |
Whether a given engine consumes it is a per-platform question, routed — not adjudicated — to ChatGPT Search, Perplexity AI and Claude. As of 2026-05 none documents llms.txt consumption.
6. Anti-patterns — when llms.txt backfires or wastes effort
Each pattern looks right and fails because it confuses the file’s job, its freshness contract, or its token economy.
| Anti-pattern | Why it looks right | Why it actually fails |
|---|---|---|
| Using llms.txt to “block AI training on my site” | It is an AI-facing file | The headline category error: it is a reading hint, not access control — the file for that is robots.txt |
| Expecting “publish llms.txt → get cited” | Other AI files affect visibility | No confirmed consumption (§2); a bet, not a lever — citation is earned in the page (Citability) |
| A stale llms.txt that drifts from the live site | It was correct when written | A wrong curated map is worse than none — it points opt-in agents at dead/old URLs |
| Dumping the whole sitemap into llms.txt | ”More links = more coverage” | Defeats the curation purpose; llms.txt is selection, sitemap.xml is completeness |
llms-full.txt bloated past context windows / full of nav chrome | ”Give the model everything” | Destroys the token-economy win — the entire point of the file |
| Hand-maintaining it where it should be build-generated | ”It rarely changes” | Drift is then guaranteed → route the fix to Deploying llms.txt |
The load-bearing line: the entire value of llms.txt is curation plus freshness; an uncurated or unmaintained llms.txt has negative value — it spends opt-in consumers’ trust pointing them wrong.
7. Why this matters for GEO — the bet, sized honestly
This restates SEO vs GEO’s invariant-baseline-vs-speculative-edge contract rather than re-deriving it; the discipline is to size the case without over-claiming.
The rational case is context-window economics. A clean, curated entry point lowers the read-cost for any opt-in LLM consumer now — your own RAG, AI coding agents, third-party tools that choose to honor it — and is forward-compatible if vendor consumption lands later. Cost to publish (especially auto-generated) ≈ 0; downside ≈ 0; the optionality is real.
The honest sizing, this entry’s spine restated: llms.txt is GEO insurance, not a GEO channel. Do it because it is cheap and forward-compatible, not because it moves citations today — the two largest measurements found no citation effect (SEJ, 300k domains; Search Engine Land, 10 sites). Where it sits in the loop: an upstream legibility/retrievability aid, strictly upstream of and not Citability (be liftable once read) — the loop mechanics sit in Answer Loop.
The balanced reading, for parity with how Generative Engine Optimization cites both sides: proponents argue the convention has “a long road ahead, but I wouldn’t bet against it” (Search Engine Land, 2025-03-28); skeptics (§2) note robots.txt and sitemaps already cover much of the need. Both can hold — which is exactly why this is a low-cost bet, not a strategy.
8. How to act + governance
| Your intent | First stop |
|---|---|
| Generate and deploy it per stack | Deploying llms.txt |
| The spec governance, adopters, and standardization path | llms.txt Working Group |
| Write the actual access policy (the file this is not) | robots.txt |
| The crawler layer this is one file within | AI Crawlers |
| The discovery/completeness file it is not a replacement for | Sitemap & IndexNow |
| Whether a given engine consumes it | ChatGPT Search · Perplexity AI · Claude |
| The next gate once a page is read | Citability |
| The method that ties it together | Generative Engine Optimization |
One line, routed not expanded: publish it because it is cheap and forward-compatible — but write the policy in robots.txt and earn citation in the page itself. llms.txt is the map you offer, not the access you grant or the citation you win.
References
The proposal & specification:
- Answer.AI — The /llms.txt file: a proposal to help LLMs use websites (Jeremy Howard, 2024-09-03)
- llmstxt.org — the /llms.txt file specification
Vendor crawler documentation (silent on llms.txt consumption, as of 2026-05):
- OpenAI — Overview of OpenAI Crawlers
- Anthropic — Does Anthropic crawl data from the web, and how can site owners block the crawler?
- Perplexity — Perplexity Crawlers
- Google Search Central — Overview of Google crawlers and fetchers
Supply-side adoption & the variant convention:
- Mintlify — Simplifying docs for AI with /llms.txt (2024-11-20)
Skepticism & independent measurement:
- Search Engine Journal — Google Says LLMs.Txt Comparable To Keywords Meta Tag (John Mueller, Reddit; 2025-04-17)
- Search Engine Journal — llms.txt Shows No Clear Effect On AI Citations Based On 300K Domains (2025-11-20)
- Search Engine Land — Does llms.txt matter? A 90-day study across 10 sites (2026-01-20)
Balanced explainer:
- Search Engine Land — Meet llms.txt, a proposed standard for AI website content crawling (2025-03-28)
Frequently asked questions
Does publishing llms.txt get my content cited by AI?
Do ChatGPT, Claude, Perplexity, or Gemini read my llms.txt?
Is llms.txt a replacement for robots.txt or sitemap.xml?
Is llms.txt an official standard?
Should I bother publishing llms.txt then?
See also
Sources
Primary
- The /llms.txt file — a proposal to provide information to help LLMs use websites · Answer.AI · 2024-09-03
- The /llms.txt file — specification · Answer.AI · 2024-09-03
- Overview of OpenAI Crawlers (GPTBot / OAI-SearchBot / ChatGPT-User) · OpenAI
- Does Anthropic crawl data from the web, and how can site owners block the crawler? · Anthropic · 2026-04-07
- Perplexity Crawlers (PerplexityBot / Perplexity-User) · Perplexity AI
- Overview of Google crawlers and fetchers (user agents) · Google Search Central · 2026-02-09
- Simplifying docs for AI with /llms.txt · Mintlify · 2024-11-20
Secondary
- Google Says LLMs.Txt Comparable To Keywords Meta Tag · Search Engine Journal
- llms.txt Shows No Clear Effect On AI Citations Based On 300K Domains · Search Engine Journal
- Does llms.txt matter? A 90-day study across 10 sites · Search Engine Land