Sitemap & IndexNow
Quick facts
- What they are
- Two URL-submission protocols — sitemap.xml (2005, pull-mode, every major search engine) and IndexNow (2021, push-mode, the Microsoft + Yandex ecosystem)
- The load-bearing line
- Both transit AI search visibility only via the host search index — Google AI Overviews inherits sitemap behavior from Google; Bing Copilot inherits both from Bing; standalone answer engines consume neither directly
- IndexNow participants (2026-05)
- Microsoft Bing · Yandex · Naver · Seznam.cz · Yep. Google has not adopted since its 2021 testing announcement; no AI vendor's first-party crawler participates
- The costliest mistake
- Pushing IndexNow expecting ChatGPT / Perplexity / Claude to refresh — wrong protocol scope. IndexNow → Bing → Copilot is its only AI-side path; standalone engines crawl on their own schedules
- Where it sits
- Upstream of AI Crawlers in the answer loop — discovery, not citation. Necessary, not sufficient: being submitted does not mean being liftable
1. What sitemap.xml and IndexNow are
Sitemap.xml and IndexNow are two URL-submission protocols that solve adjacent but distinct problems. Sitemap.xml is the older one — published in 2005 as an open standard (sitemaps.org) — and works in pull mode: a publisher hosts a list of URLs at a known path, and search engines fetch the file on their own crawl schedule. Every major search crawler honors it, including Googlebot, Bingbot, and YandexBot.
IndexNow is the newer one — launched in 2021 by Microsoft and Yandex (indexnow.org) — and works in push mode: a publisher notifies a participating endpoint the moment a URL changes, and the receiving engine refreshes its index in minutes. The participant list as of 2026-05 is Microsoft Bing, Yandex, Naver, Seznam.cz, and Yep (indexnow.org). Google has not adopted IndexNow since its 2021 testing announcement, and no AI vendor’s first-party crawler participates.
For GEO purposes, the load-bearing fact is downstream of either protocol’s mechanics: their AI-search effect transits only through the host search index. Google AI Overviews inherits Google’s sitemap behavior because AIO grounds in Google’s classic web index; Bing Copilot inherits Bing’s sitemap and IndexNow behavior for the same reason. Standalone answer engines — ChatGPT Search, Perplexity, Claude — run their own retrieval crawlers on their own crawl schedules, and their published crawler documentation does not mention sitemap.xml or IndexNow consumption.
In the answer-loop sequence, this layer sits upstream of AI Crawlers: discovery (you exist on someone’s URL list) comes before crawl (a bot fetches you), which comes before retrievability (the bot’s index includes you), which comes before citability (you are liftable when read). Submission is necessary, not sufficient — clearing it makes you a candidate, not a citation.
2. The mechanics — what each protocol actually does
The two protocols differ on five operational dimensions, summarized below.
| Aspect | sitemap.xml | IndexNow |
|---|---|---|
| Standard | sitemaps.org (2005, open protocol) | indexnow.org (2021, Microsoft + Yandex) |
| Mode | Pull — engine fetches on its own crawl schedule | Push — publisher notifies on URL change |
| Consumed by | Googlebot, Bingbot, YandexBot, and effectively every major search crawler | Microsoft Bing, Yandex, Naver, Seznam.cz, Yep — not Google, not any AI vendor’s first-party crawler |
| Latency | Hours to days, depending on the engine’s crawl schedule | Minutes, push-on-change |
| Declared at | Sitemap: directive in robots.txt + optional Search Console / Webmaster Tools submission | One HTTP request per URL change to any participating endpoint, with the URL plus a key |
| Scale limits | 50,000 URLs and 50 MB (uncompressed) per file; sitemap indexes for sites above that (sitemaps.org) | Up to 10,000 URLs per batched POST; arbitrary key tokens hosted at site root |
The publisher-side IndexNow surface is intentionally minimal — one HTTP request per URL change, one key file at the site root, no API tokens, no per-engine authentication. A complete minimal push looks like:
# IndexNow — the minimal push, the principle not a full client
1. Host <your-key>.txt at your site root, body containing <your-key>
e.g. https://example.com/abc123.txt → body: abc123
2. On URL change, GET (single-URL form):
https://api.indexnow.org/IndexNow?url=https://example.com/page&key=abc123
Or POST (batch form, up to 10k URLs):
POST https://<participating-engine>/indexnow
Content-Type: application/json
{ "host": "example.com",
"key": "abc123",
"urlList": ["https://example.com/page1", "https://example.com/page2"] }
3. The receiving engine shares the submission with the other IndexNow
participants (Bing → Yandex → Naver → Seznam → Yep) automatically.
Submit to one, reach all five.
The full JSON shape and per-engine endpoint list are in the official spec; CDNs including Cloudflare publish IndexNow at the edge with a single toggle (Cloudflare, 2021), so many sites already emit IndexNow signals without site-side code.
A note on Google and IndexNow: Google announced in November 2021 that it would test the protocol for sustainability gains (Search Engine Land, 2021-11-09). The test did not move to adoption; as of 2026 independent coverage continues to treat Google’s absence as the status quo (ppc.land, 2024-12-30). The operational implication: a single IndexNow push reaches Bing, Yandex, Naver, Seznam, and Yep — and through them, on the AI side, only Bing Copilot.
3. Per-engine effect on AI visibility — the canonical table
The single most-quoted asset on this page. Sitemap.xml and IndexNow’s effect on an AI surface is determined by whether that surface reuses a host search index, and the engines split cleanly into three groups.
| AI surface | sitemap.xml | IndexNow | Why |
|---|---|---|---|
| Google AI Overviews | Used (via Google index) | Not used — Google has not adopted IndexNow | AIO grounds in Google’s classic web index; sitemap submission via Search Console accelerates Google discovery, which feeds AIO’s candidate pool |
| Bing Copilot | Used (via Bing index) | Used (via Bing index) | IndexNow is Microsoft’s protocol; Copilot inherits Bing’s index, so a push refreshes Bing in minutes and Copilot’s grounding pool with it |
| ChatGPT Search · Perplexity · Claude | Indirect — each runs its own retrieval crawler on its own crawl schedule; the Sitemap: directive in robots.txt may aid discovery but is not required | Not used — none of the three documents IndexNow or any submission protocol in its crawler docs | Each engine maintains an independent retrieval index; no public URL-submission channel exists for any of them |
The load-bearing read of this table: submission protocols affect an AI surface only where that surface inherits a search ranking that uses them. Google AI Overviews inherits, Bing Copilot inherits twice (sitemap + IndexNow), and standalone answer engines inherit nothing — they crawl on their own. “Push IndexNow to appear in ChatGPT” is therefore a category error (§6).
4. “Submit to the AI index” — what does and does not exist
The most common practitioner question on this layer is: how do I submit my URL to ChatGPT, Perplexity, or Claude? As of 2026-05, the answer is: directly, you can’t. No first-party URL-submission channel exists for any major standalone answer engine.
Submission channels that do exist — each reaches an AI surface transitively, via a host search index:
- Google Search Console — sitemap submission and URL Inspection feed Google’s classic web index, which AI Overviews reuses (Build and submit a sitemap)
- Bing Webmaster Tools — sitemap submission and URL Submit feed Bing’s index, which Bing Copilot reuses
- IndexNow — push notification refreshes Bing’s index (and Yandex’s, Naver’s, Seznam’s, Yep’s) in minutes; reaches Bing Copilot transitively at the lowest available latency (How to add IndexNow)
Submission channels that do not exist:
- A first-party OpenAI / Anthropic / Perplexity URL-submission API — their crawler docs cover robots.txt, IP-range allowlisting, and access policy but say nothing about how to submit URLs (OpenAI; Anthropic; Perplexity)
- IndexNow consumption by Google — frequently asked, but Google has not adopted the protocol
- Any single channel that reaches all AI surfaces at once
The honest framing: there is no “submit to AI” today — only “submit to a host index” and wait for the AI surface to inherit. When a client asks where to push their page so it shows up in ChatGPT or Perplexity, the answer is not a submission protocol. It is the work that makes the page liftable once a retrieval crawler reaches it (citability) plus making sure that crawler can reach the page at all (AI Crawlers).
5. Sitemap.xml ≠ llms.txt ≠ robots.txt — three files, three jobs
Three root-level files appear in any complete crawler-and-discovery setup, and they are routinely confused. Each one does exactly one job and refuses to do the others.
| File | What it does | What it does not do |
|---|---|---|
| robots.txt | Access — may a bot fetch this path at all | Does not list URLs, signal freshness, or claim completeness |
| sitemap.xml | Discovery + completeness — here is everything I want indexed | Does not curate, does not grant access, does not act as a quality signal |
| llms.txt | Curation + clean rendering — read these pages first, in clean markdown | Does not grant or deny access, does not claim completeness, does not signal indexing intent |
The load-bearing line: sitemap.xml is not a curation file, not an access rule, not a “best of” list. It is a completeness manifest. Trying to slim sitemap.xml down to “only the pages I want AI engines to see” misreads the protocol’s job — curation belongs to llms.txt; sitemap.xml must reflect everything you want indexed, or the engine treats the omission as a coverage gap. The same direction inverted gives the llms.txt §4 three-file table — read together they are the canonical sitemap-vs-llmstxt disambiguation, mirrored from each protocol’s own side.
6. Anti-patterns — when submission backfires or wastes effort
Each anti-pattern below sounds right and fails because it confuses a protocol’s scope, a participating-engine list, or a file’s job.
| Anti-pattern | Why it sounds right | Why it actually fails |
|---|---|---|
| ”Push IndexNow → appear in ChatGPT, Perplexity, or Claude” | IndexNow is an open standard, and some AI vendors say they “respect web standards” | The IndexNow participant list (indexnow.org) is Bing, Yandex, Naver, Seznam, and Yep — no AI vendor’s crawler is on it. IndexNow → Bing → Copilot is the only path that reaches any AI surface |
| ”Curate sitemap.xml to send AI engines only the best pages” | Curation feels GEO-aligned — quality over quantity | Misreads sitemap.xml’s job. The protocol is completeness — every URL you want indexed. An engine treats a curated subset as a coverage signal, not a quality one. Curation is llms.txt’s job |
”Bloat sitemap.xml with noindex or canonical-to-elsewhere URLs" | "More URLs = wider discovery” | Engines treat inconsistency between sitemap and on-page signals as quality-signal noise. Google’s sitemap doc names the URL classes that belong, and the ones that don’t (Build and submit a sitemap) |
| “Try IndexNow against Google anyway — it can’t hurt" | "Push is always better than pull; worst case, nothing happens” | Google has not adopted IndexNow since its 2021 testing announcement (Search Engine Land, 2021; status unchanged through 2024 — ppc.land). The submission goes nowhere, and the monitoring noise is real. For Google, use Search Console |
The line that closes the section: submission is a hygiene baseline, not a GEO lever. It earns you candidacy at the host index, which an AI surface may or may not reuse. The signals that decide whether you are actually picked once you are in the candidate pool live elsewhere — citability (chunk shape, quotable claims), E-E-A-T, entity recognition. Over-investing in the submission layer (curating sitemaps, pushing IndexNow at non-participating engines) does not move citation; under-investing forfeits candidacy. Get it correct and move on.
7. Why this matters for GEO + how to act
| Your intent | First stop |
|---|---|
| Get my site discovered by Google → AIO candidate eligibility | Google Search Console sitemap submission (Build and submit a sitemap) + Sitemap: directive in robots.txt |
| Get my site discovered by Bing → Copilot candidate eligibility | Bing Webmaster Tools sitemap + IndexNow integration (How to add IndexNow) |
| Push fresh URLs to Bing / Copilot in minutes | IndexNow per indexnow.org/documentation — see §2 for the minimal call |
| Get cited by ChatGPT, Perplexity, or Claude | No submission channel exists today. Invest in citability and confirm the retrieval crawler can reach you — see AI Crawlers |
| Audit index coverage and crawler access as part of a GEO sweep | GEO Audit — sitemap presence + crawler access are two checkpoints |
| Make sure a bot can parse the page once it gets there | SSR for AI Crawlers — a separate problem with a separate fix |
| Govern crawler access at the protocol level | AI Crawlers · robots.txt |
| Distinguish sitemap.xml from llms.txt and robots.txt | §5 above; the mirror table from llms.txt §4 |
The closing read, stated plainly: sitemap.xml is a hygiene baseline GEO never drops; IndexNow is a low-cost investment worth making for the share of your AI exposure that flows through Bing Copilot. Neither is a GEO load-bearing lever — those still live in citability, E-E-A-T, and entity recognition. Get the indexing layer correct, then spend marginal effort where citation is actually decided.
References
Primary — protocol specifications:
- sitemaps.org — Sitemaps XML format — Protocol (open standard, since 2005)
- indexnow.org — Documentation · Homepage (participant list)
Primary — engine documentation:
- Google Search Central — Build and submit a sitemap
- Microsoft Bing — How to add IndexNow to your website · Bing Webmaster Guidelines
- OpenAI — Overview of OpenAI Crawlers (no submission protocol documented)
- Anthropic — Does Anthropic crawl data from the web (no submission protocol documented)
- Perplexity — Perplexity Crawlers (no submission protocol documented)
Primary — historical anchor:
- Search Engine Land — Google is testing the IndexNow protocol for sustainability (2021-11-09; the announcement that did not move to adoption)
Secondary — independent coverage and infrastructure:
- ppc.land — Google’s absence from IndexNow raises questions about web indexing standards (2024-12-30; current-status framing)
- Cloudflare — Cloudflare now supports IndexNow (2021-10-18; edge-level automatic IndexNow push)
- Search Engine Land — IndexNow — new initiative by Microsoft and Yandex (2021-10-18; launch coverage)
Frequently asked questions
Does submitting a sitemap or pushing IndexNow get my page cited by AI search engines?
Does Google accept IndexNow pushes?
Do ChatGPT Search, Perplexity, or Claude read sitemap.xml or IndexNow?
Should I curate sitemap.xml to send AI engines only my best pages?
What's the right way to submit my URL to AI engines today?
See also
Sources
Primary
- Sitemaps XML format — Protocol · sitemaps.org
- IndexNow — Documentation · indexnow.org
- IndexNow — homepage (participating engines) · indexnow.org
- How to add IndexNow to your website (Bing Webmaster Tools) · Microsoft Bing
- Build and submit a sitemap · Google Search Central · 2025-12-10
- Bing Webmaster Guidelines · Microsoft Bing
- Overview of OpenAI Crawlers (GPTBot / OAI-SearchBot / ChatGPT-User) · OpenAI
- Does Anthropic crawl data from the web, and how can site owners block the crawler? · Anthropic · 2026-04-07
- Perplexity Crawlers · Perplexity AI
- Google is testing the IndexNow protocol for sustainability · Search Engine Land · 2021-11-09