Standard · Infrastructure

llms.txt

Quick facts

What it is: A proposed publishing convention (Jeremy Howard / Answer.AI, Sep 2024): one curated markdown file at /llms.txt pointing an LLM at the clean pages to read first. A wayfinding hint, not an access or discovery file
The load-bearing distinction: Supply-side adoption is real and rising (auto-generated by docs platforms; ~10% of sampled domains). Demand-side consumption is unconfirmed — no major AI vendor documents reading it
Officially consumed by AI vendors?: No public confirmation, as of 2026-05. OpenAI / Anthropic / Google / Perplexity crawler docs are silent on llms.txt; Google's John Mueller likened it to the keywords meta tag (Reddit, 2025-04)
Industry-standard?: No. A proposed convention, not an IETF/W3C-ratified standard; governance is community-led — see the llms.txt Working Group
Where it sits: An upstream legibility/retrievability aid, not citability. It is GEO insurance — cheap and forward-compatible — not a GEO channel that moves citations today

1. What llms.txt is — a publishing convention, not an access control

llms.txt is a single markdown file at site root (/llms.txt), proposed by Jeremy Howard at Answer.AI in September 2024 (see the original proposal).

Definition (GEO Wiki working definition): llms.txt is a proposed publishing convention — one curated, clean-markdown file that points an LLM at which pages to read first and where the clean text lives. It solves a legibility problem (HTML is noisy, context windows are finite), not an access problem and not a discovery problem.

The proposal’s own framing: “a markdown file that provides brief background information and guidance, along with links to markdown files providing more detailed information” (see Answer.AI).

This entry is the standard plus its honest status — the hub. The operational layers are routed out and held throughout:

This entry owns	Routed to
The format, and the honest supply-vs-demand status	— (this entry)
The doing — generate it, keep it fresh, per CMS/stack	Deploying llms.txt
The spec governance and who is adopting	llms.txt Working Group
The access policy this file is not	robots.txt
The crawler layer this is one file within	AI Crawlers

Standard here, operation there — the same split Schema.org for AI draws against its playbooks. The GEO hub folds llms.txt under crawl/legibility plumbing rather than treating it as its own lever; this page is where that compression is unpacked honestly.

2. The load-bearing distinction — supply-side ≠ demand-side

This is the single highest-value disambiguation on the page, and this entry’s counterpart to Schema.org for AI §2’s “markup is not a signal.” Collapsing supply (sites publishing the file) into demand (AI engines consuming it) is where the false hope comes from.

	Supply side	Demand side
What it means	Sites publish `/llms.txt`	AI engines fetch and use it at crawl/inference time
Status	Real and rising — auto-generated by docs platforms; a 300,000-domain study measured ~10% adoption (Search Engine Journal, 2025-11-20)	Unconfirmed — no major vendor documents reading third-party llms.txt
Evidence	Mintlify auto-hosts it; Anthropic, Google and Perplexity each publish one for their own docs	OpenAI (bots docs), Anthropic, Perplexity and Google crawler docs are silent on llms.txt

The trap inside the evidence: Anthropic publishes an llms.txt for its own developer docs, yet ClaudeBot’s published crawler documentation does not commit to consuming one. Publishing ≠ consuming. The same holds for the GPTBot and PerplexityBot docs — silent on llms.txt.

The load-bearing line, stated plainly: llms.txt today is a low-cost, forward-compatible bet on a convention — not a citation channel you can rely on. The cost of publishing is ≈ 0; the confirmed payoff is not “AI vendors read it,” it is “any opt-in consumer reads a clean curated map cheaply now.”

The cautionary anchor, carried as the principle, not a verdict: Google Search Advocate John Mueller publicly compared llms.txt to the old keywords meta tag — “none of the AI services have said they’re using LLMs.TXT (and you can tell when you look at your server logs that they don’t even check for it)” (Reddit, April 2025, reported by Search Engine Journal). The skepticism is real and represented here; §7 sizes the narrower case it does not refute.

3. The canonical file format — the site-quoted asset

This is the entry’s load-bearing asset: the spec the rest of the site quotes. The standard is owned here; per-stack generation is routed to Deploying llms.txt. Per llmstxt.org, the structure is:

# Project Name

> A short blockquote summarising the project — the key info needed
> to make sense of the rest of the file.

Zero or more free prose paragraphs (no headings) for extra context.

## Docs

- [Quickstart](https://example.com/quickstart.md): how to get started
- [API reference](https://example.com/api.md): the full endpoint list

## Optional

- [Changelog](https://example.com/changelog.md): can be skipped if the
  context window is tight

Element	Required?	What it is for
`# H1` project name	Yes — the only required element	Names the entity the file describes
`> blockquote` summary	Recommended	One-line gist; key info to read the rest
Free prose paragraphs	Optional	Extra context, no headings
`## H2` link-list sections	Optional, repeatable	Curated link lists, each `[name](url): note`
`## Optional` section	Optional	Lower-priority links an LLM may skip under token pressure (llmstxt.org)

llms.txt vs llms-full.txt — name the distinction precisely, because the popular name is not the spec’s. The official proposal defines processed expansions llms-ctx.txt and llms-ctx-full.txt, generated from llms.txt by the llms_txt2ctx tool (see Answer.AI). The widely deployed file literally named llms-full.txt — the full text of all docs concatenated into one file — is a Mintlify-popularized de-facto convention, not the original spec (Mintlify, 2024-11-20). The mental model that survives both: llms.txt = a curated index (wayfinding); the full variant = a drop-in content payload (and a context-window risk — see §6).

4. llms.txt vs robots.txt vs sitemap.xml — three files, three jobs

The single most common category error is conflating these three root files. They do not overlap; none substitutes for another.

File	Its job	What it does not do
`robots.txt`	Access control — may a bot fetch this path (see robots.txt)	Does not curate, render, or rank; it is a request, not enforcement
`sitemap.xml`	Discovery / completeness — here is everything, for indexing (Sitemap & IndexNow)	Does not curate or grant access; not a “best of” list
`llms.txt`	Curation + clean rendering — read these first, in clean markdown	Does not grant/deny access, claim completeness, or act as a ranking signal

The load-bearing line: llms.txt is not “a sitemap for AI” (that conflates curation with completeness) and not “robots.txt for AI” (that conflates a reading hint with an access rule). The access decision is the job of robots.txt and the broader AI Crawlers layer; llms.txt never grants or blocks a fetch.

5. What llms.txt does and does not do — the bounded reading

The honesty section, carried with the same discipline as Citability §5 and Schema.org for AI §6.

llms.txt does	llms.txt does not
Offer a clean, token-cheap, curated entry point	Grant or deny crawler access (≠ robots.txt)
Help opt-in consumers you control read the right pages cheaply today	Block training, or guarantee crawling, indexing, or citation
Position you for ~0 cost if vendor consumption arrives	Get read by browsers or end users
Describe site structure for any tool that chooses to honor it	Act as a ranking or citation signal

What’s true	The bounded reading
Adoption is rising; docs platforms auto-generate it	That is supply, not demand — publishing is not consumption
Anthropic, Google and Perplexity publish an llms.txt	They host one for their own docs; their crawlers are not documented to read yours
A 300k-domain study measured ~10% adoption	Real momentum, no measured citation effect — “not yet” (SEJ, 2025-11-20)
A 90-day, 10-site study tracked AI traffic before/after	Recommended treating it as infrastructure like sitemaps, not a growth strategy (Search Engine Land, 2026-01-20)

Whether a given engine consumes it is a per-platform question, routed — not adjudicated — to ChatGPT Search, Perplexity AI and Claude. As of 2026-05 none documents llms.txt consumption.

6. Anti-patterns — when llms.txt backfires or wastes effort

Each pattern looks right and fails because it confuses the file’s job, its freshness contract, or its token economy.

Anti-pattern	Why it looks right	Why it actually fails
Using llms.txt to “block AI training on my site”	It is an AI-facing file	The headline category error: it is a reading hint, not access control — the file for that is robots.txt
Expecting “publish llms.txt → get cited”	Other AI files affect visibility	No confirmed consumption (§2); a bet, not a lever — citation is earned in the page (Citability)
A stale llms.txt that drifts from the live site	It was correct when written	A wrong curated map is worse than none — it points opt-in agents at dead/old URLs
Dumping the whole sitemap into llms.txt	”More links = more coverage”	Defeats the curation purpose; llms.txt is selection, sitemap.xml is completeness
`llms-full.txt` bloated past context windows / full of nav chrome	”Give the model everything”	Destroys the token-economy win — the entire point of the file
Hand-maintaining it where it should be build-generated	”It rarely changes”	Drift is then guaranteed → route the fix to Deploying llms.txt

The load-bearing line: the entire value of llms.txt is curation plus freshness; an uncurated or unmaintained llms.txt has negative value — it spends opt-in consumers’ trust pointing them wrong.

7. Why this matters for GEO — the bet, sized honestly

This restates SEO vs GEO’s invariant-baseline-vs-speculative-edge contract rather than re-deriving it; the discipline is to size the case without over-claiming.

The rational case is context-window economics. A clean, curated entry point lowers the read-cost for any opt-in LLM consumer now — your own RAG, AI coding agents, third-party tools that choose to honor it — and is forward-compatible if vendor consumption lands later. Cost to publish (especially auto-generated) ≈ 0; downside ≈ 0; the optionality is real.

The honest sizing, this entry’s spine restated: llms.txt is GEO insurance, not a GEO channel. Do it because it is cheap and forward-compatible, not because it moves citations today — the two largest measurements found no citation effect (SEJ, 300k domains; Search Engine Land, 10 sites). Where it sits in the loop: an upstream legibility/retrievability aid, strictly upstream of and not Citability (be liftable once read) — the loop mechanics sit in Answer Loop.

The balanced reading, for parity with how Generative Engine Optimization cites both sides: proponents argue the convention has “a long road ahead, but I wouldn’t bet against it” (Search Engine Land, 2025-03-28); skeptics (§2) note robots.txt and sitemaps already cover much of the need. Both can hold — which is exactly why this is a low-cost bet, not a strategy.

8. How to act + governance

Your intent	First stop
Generate and deploy it per stack	Deploying llms.txt
The spec governance, adopters, and standardization path	llms.txt Working Group
Write the actual access policy (the file this is not)	robots.txt
The crawler layer this is one file within	AI Crawlers
The discovery/completeness file it is not a replacement for	Sitemap & IndexNow
Whether a given engine consumes it	ChatGPT Search · Perplexity AI · Claude
The next gate once a page is read	Citability
The method that ties it together	Generative Engine Optimization

One line, routed not expanded: publish it because it is cheap and forward-compatible — but write the policy in robots.txt and earn citation in the page itself. llms.txt is the map you offer, not the access you grant or the citation you win.

References

The proposal & specification:

Answer.AI — The /llms.txt file: a proposal to help LLMs use websites (Jeremy Howard, 2024-09-03)
llmstxt.org — the /llms.txt file specification

Vendor crawler documentation (silent on llms.txt consumption, as of 2026-05):

OpenAI — Overview of OpenAI Crawlers
Anthropic — Does Anthropic crawl data from the web, and how can site owners block the crawler?
Perplexity — Perplexity Crawlers
Google Search Central — Overview of Google crawlers and fetchers

Supply-side adoption & the variant convention:

Mintlify — Simplifying docs for AI with /llms.txt (2024-11-20)

Skepticism & independent measurement:

Search Engine Journal — Google Says LLMs.Txt Comparable To Keywords Meta Tag (John Mueller, Reddit; 2025-04-17)
Search Engine Journal — llms.txt Shows No Clear Effect On AI Citations Based On 300K Domains (2025-11-20)
Search Engine Land — Does llms.txt matter? A 90-day study across 10 sites (2026-01-20)

Balanced explainer:

Search Engine Land — Meet llms.txt, a proposed standard for AI website content crawling (2025-03-28)

Frequently asked questions

Does publishing llms.txt get my content cited by AI?

Not on current evidence. There is no public confirmation that any major AI engine consumes /llms.txt, and the two largest measurements found no citation effect — a 300,000-domain study concluded it 'doesn't seem to directly impact AI citation frequency, at least not yet,' and a 90-day, 10-site study recommended treating it as infrastructure comparable to sitemaps, not a growth strategy. Citation is still earned in the page itself, which is citability's job. Publish llms.txt because it is cheap and forward-compatible, not because it moves citations.

Do ChatGPT, Claude, Perplexity, or Gemini read my llms.txt?

No vendor has publicly documented that its crawler or inference pipeline consumes third-party /llms.txt, as of 2026-05. OpenAI's, Anthropic's, Perplexity's and Google's official crawler docs do not mention it; Google's John Mueller stated that server logs show the AI services 'don't even check for it.' Note the trap: Anthropic, Google and Perplexity each publish an llms.txt for their own docs — that is supply-side behaviour (they host one), not evidence their crawlers read yours.

Is llms.txt a replacement for robots.txt or sitemap.xml?

No — three root files, three different jobs. robots.txt is access control (may a bot fetch this path). sitemap.xml is discovery and completeness (here is everything, for indexing). llms.txt is curation and clean rendering (read these pages first, in clean markdown). None substitutes for another: llms.txt is not 'a sitemap for AI' (that conflates curation with completeness) and not 'robots.txt for AI' (that conflates a reading hint with an access rule).

Is llms.txt an official standard?

No. It was proposed by Jeremy Howard at Answer.AI in September 2024 and is governed informally by the community, not ratified by the IETF or W3C. The spec lives at llmstxt.org; community registries list adopters. Adoption is real but partial — a 300,000-domain study measured roughly 10% of sampled domains publishing one. Treat it as a proposed convention with momentum, not a settled standard.

Should I bother publishing llms.txt then?

Yes, if it is cheap — ideally build-generated — and you keep it in sync with the live site. The defensible case is not 'AI vendors read it' but: it is near-zero cost, forward-compatible if consumption arrives, and useful today for opt-in consumers you control (your own RAG, AI coding agents, third-party tools that choose to honor it). The one hard rule: a stale or uncurated llms.txt has negative value — it spends opt-in consumers' trust pointing them at the wrong pages.

Sources