Concept · Infrastructure

Schema.org for AI

Quick facts

What it is: The AI-relevant subset of Schema.org — the types and properties that change how an engine resolves your entity and parses your page. Not the spec; the subset that touches AI
Is it a ranking or citation signal?: No. It gates eligibility for features and aids entity resolution and parsing. Google states markup enables a feature, it does not rank you or guarantee the feature
Where it acts: The pre-retrieval parse + entity layer — not the grounding/selection gate that Citability and E-E-A-T govern. Markup makes an entity resolvable, not a passage liftable
Strongest evidence: Index-integrated AI (Google AI Overviews, Bing Copilot) uses it via the search index. Live-fetch chatbots (ChatGPT, Perplexity) read JSON-LD as plain page text, not parsed structured data (searchVIU, 2025)
Highest-leverage primitive: sameAs on Organization / Person — the entity-resolution join key into the knowledge graph. The one property worth getting right first

1. What “Schema.org for AI” is

This entry is not a mirror of the Schema.org documentation. It is the AI-citation-relevant subset: the handful of types and properties that change how an AI engine resolves your entity and parses your page — and nothing more.

Definition (GEO Wiki working definition): Schema.org for AI is the subset of structured-data vocabulary whose presence changes how an AI engine disambiguates the entity behind a page and parses the page reliably — distinct from whether any passage on it is liftable.

2. Markup ≠ citation — why schema is infrastructure, not a signal

The load-bearing honesty, and this entry’s counterpart to E-E-A-T §1’s “not a score”: structured data is not a ranking factor and not a citation lever. Google states it plainly — “using structured data enables a feature to be present, it does not guarantee that it will be present”, and a structured-data manual action “doesn’t affect how the page ranks” (see General Structured Data Guidelines). Google’s 2025 AI-search guidance repeats it: markup “makes pages eligible for certain search features and rich results”, not ranking (Succeeding in AI search).

What markup actually buys is exactly three things, all upstream of selection:

What schema buys	What it does not buy
Reliable, unambiguous parse of the page’s facts	A ranking or citation boost
Entity disambiguation (who/what you are) via the knowledge graph	A passage becoming liftable
Eligibility for structured/rich surfaces (where they still exist)	A guarantee the surface appears

Where it acts is the whole point. Schema operates at the pre-retrieval parse and entity layer — never at the grounding/selection gate that Citability and E-E-A-T govern in Answer Loop §3:

  page ──► [ PARSE + ENTITY LAYER ]      ◄── schema acts here
              │  facts parsed cleanly
              │  entity resolved (sameAs → KG)
              ▼
  retrieval ──► candidate passages
              ▼
  [ GROUNDING / SELECTION GATE ]         ◄── schema does NOT act here
   citability (shape) · E-E-A-T (trust)      Citability & E-E-A-T own this
              ▼
  grounded answer ──► (maybe) citation

The orthogonality line, stated as the reciprocal of Citability §2’s: marking up an FAQ does not make its answers citable. Passage shape is citability’s, decided in the visible content. Markup only declares structure a parser could already extract.

3. The AI-relevant type subset — the load-bearing table

The canonical table the rest of the site quotes — the E-E-A-T §4-of-this-entry. Each type is read for what it asserts to an engine and which proxy it feeds, not for spec completeness.

Type	What it asserts to an AI	Proxy it feeds	Failure shape
`Organization`	This site/brand is this entity	Entity recognition · KG presence	No `sameAs`; entity stays ambiguous, never resolved
`Person`	This author/expert is this identity	Entity + the trust proxies E-E-A-T names	Anonymous byline; no resolvable identity
`Article` / `NewsArticle`	This page is an article, by X, dated Y	Type + authorship + freshness	Untyped page; author/date not machine-stated
`WebSite`	Site-level identity, search action	Site entity binding	Page-only signals, no site entity
`BreadcrumbList`	Where this sits in the site graph	Site architecture / context	Orphan page, no structural context
`FAQPage`	These Q&As exist on the page	Answer-shape declaration (see §2 + §6)	Treated as liftable — it is not; that is citability’s
`HowTo`	These ordered steps exist	Answer-shape declaration	Same — and its Google rich result was removed (§6)

The single highest-leverage rows are Organization and Person, because they carry the property that actually feeds the layer AI consumes — covered next. FAQPage/HowTo are deliberately last: they describe shape a parser already sees and carry the §6 caveat.

4. The AI-relevant property subset — `sameAs` is the workhorse

Companion table, same reading. Properties, not types, are where the entity leverage concentrates.

Property	What it asserts	Proxy it feeds	Failure shape
`sameAs`	”This entity is the one at these URLs” (Wikipedia, Wikidata, official, socials)	Entity recognition · KG presence	Entity never joined to the graph; stays ambiguous
`mainEntity`	The primary thing this page is about	Topic/entity binding	Page about everything, resolved as nothing
`about` / `mentions`	Entities this content concerns/cites	Topical + entity graph	No machine-stated topic anchors
`author`	The `Person`/`Organization` behind it	Authorship → trust proxies	Unattributed; trust proxy missing
`knowsAbout` / `hasOccupation`	An author’s domain and role	Expertise corroboration	Asserted expertise with nothing to resolve
`speakable`	Sections fit for text-to-speech	A beta, US/EN/news-only feature	Over-relied on; not a general surface (Google, beta)

sameAs is the entity-resolution join key — the single property worth getting right before any other. It is the explicit edge from your markup to the knowledge graph the model already trusts. A minimal, illustrative block:

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Example Co",
  "url": "https://example.com",
  "sameAs": [
    "https://en.wikipedia.org/wiki/Example_Co",
    "https://www.wikidata.org/wiki/Q000000",
    "https://www.linkedin.com/company/example-co"
  ]
}

This is one illustrative block, not a template. Syntax — JSON-LD vs Microdata vs RDFa, where it goes, escaping — is JSON-LD’s. Full per-type templates and validation are the Schema Implementation playbook’s. Which markup feeds the entity layer and why is below; the resolution mechanism is treated in Entity Recognition and Knowledge Graph Presence.

5. How AI engines actually consume schema — the honest mechanism, per surface

The honesty section, mirror of E-E-A-T §5. The core: engines do not “read your schema and rank you.” How — and whether — markup is consumed at answer time splits hard by surface, and the evidence now exists to say so rather than hedge.

Surface	How schema is consumed	Evidence strength
Google AI Overviews / AI Mode	Via Google’s existing index and structured-data systems — AI search “is still search”; same eligibility rules, no AI-specific markup required	Strongest — Google’s own docs (AI features)
Bing Copilot	Via the Bing index — Microsoft has confirmed structured data is used	Strong — vendor-confirmed
ChatGPT / Perplexity (live fetch)	The page is fetched and rendered to text; JSON-LD is read as plain text, not parsed as a graph	Strong (negative) — controlled test
Claude / Gemini (direct fetch)	Same: no evidence of dedicated JSON-LD parsing at answer time	Consistent with the above

The negative result is well-supported, not speculative. A controlled December 2025 test placed a price only inside JSON-LD across five systems; none of the live-fetch chatbots extracted it (searchVIU). An independent observation found ChatGPT and Perplexity will surface values even from invalid, fabricated schema — they are reading the markup as text on the page, not as a parsed structure (Search Engine Roundtable, observation).

The seam, restated: the entity benefit still reaches these models — but through the model prior and the knowledge graph, not by parsing your JSON-LD during the fetch. Why each markup is an entity proxy is this entry’s; how the identity resolves across platforms is Entity Recognition’s and Knowledge Graph Presence’s.

6. What the evidence says — and what it does not

The bounded-reading section, same honesty discipline as E-E-A-T §6.

What holds	The bounded reading
Index-integrated AI (Google, Bing) uses structured data	Through the index, as eligibility — Google states it is not a ranking boost
Valid markup that matches content reduces extraction ambiguity	It clarifies what is already there; it cannot manufacture trust or liftability
Schema coverage does not correlate with AI citation rates	A Dec-2024 study found no correlation; treat schema as hygiene, not a lever (Search Engine Land)
Rich-result surfaces can be revoked unilaterally	Google restricted FAQ rich results to gov/health and removed HowTo entirely in 2023 (Google)

The FAQ/HowTo deprecation is the cleanest cautionary datum: a surface that schema “earned” was withdrawn by the vendor in one announcement. Markup is not a durable benefit you own.

One boundary on the GEO literature, stated explicitly: Aggarwal et al. measured content substance and structure rewrites — cite sources, add statistics, quotations — and did not test schema markup as a variable (KDD ‘24, arXiv:2311.09735; paper summary). The headline GEO numbers therefore do not transfer to “add schema.” Borrowing them here would be the exact over-claim §7 warns against.

The position, the reciprocal of E-E-A-T §6’s “earned, not annotated”: schema is declared, not rewarded. It lets engines trust what is already on the page; it cannot create what is not.

7. Anti-patterns — schema spam and why it backfires

Mirror of E-E-A-T §7. Each pattern looks like the signal it imitates and fails on a trust or anti-abuse filter.

Anti-pattern	Why it looks like it works	Why it actually fails
Markup not matching visible content	Looks like rich structure	Google manual action strips eligibility; text-reading AI sees the contradiction directly
`FAQPage` stuffing for SERP real estate	Looks like answer coverage	Rich result restricted to gov/health since 2023; no payoff, accuracy risk
Fabricated `Organization` / `Person`	Looks like a resolved entity	Fails `sameAs` / KG corroboration — the same failure as fake authorship in E-E-A-T §7
Over-marking every element	Looks thorough	Noise, validation errors, mismatch risk; no upside
JSON-LD contradicting on-page text	Looks complete	Live-fetch AI reads both as text and trusts neither

The load-bearing line: invalid or content-mismatched schema is worse than none. It trips AI anti-abuse the way fabricated authority trips trust filters — the over-claim pattern that AI Content Detection covers. Google’s standing position is that there are no special markup tricks; markup must mirror content that is already visible.

8. Schema across SEO and GEO — invariant baseline vs what changes

Mirror of E-E-A-T §8; this restates SEO vs GEO’s shared-baseline contract rather than re-deriving it.

Invariant: valid markup that matches content is a shared SEO+GEO baseline — on the “never drop” list. It costs little, it cannot be the differentiator, and removing it degrades both blue links and machine parseability.

What changes is the consumer: from a rich-result renderer to an entity/parse layer feeding the model’s prior.

Surface	Schema delta
Google AI Overviews	The native home — index-based; schema reused from Google’s existing systems, weighed as eligibility not rank
Live-fetch chatbots	Markup read as page text; the value is indirect, via entity presence in the prior/KG — not the JSON-LD on the page

Two routed lines, not expanded: the trust-readability of non-text assets — ImageObject/VideoObject provenance — is Multimodal Signals’; and the format choice underneath all of this is JSON-LD’s.

9. Why this matters for GEO + how to act

Schema is infrastructure that feeds the entity layer — not a lever on the grounding choke point Answer Loop §3 calls highest-leverage. Get it correct and out of the way; spend the real effort on citability and trust. This entry is the concept; the doing is the playbook.

Your intent	First stop
Implement or fix markup correctly	Schema Implementation
Decide format / syntax	JSON-LD
Understand why markup feeds entity resolution	Entity Recognition · Knowledge Graph Presence
Audit schema as part of the whole site	Full GEO Audit
Make a passage actually liftable	Citability
See where this sits in the loop	Answer Loop
The method that ties it together	Generative Engine Optimization

For the term itself and its neighbors, see the GEO glossary.

References

Official (Google):

Google Search Central — General Structured Data Guidelines · Introduction to structured data markup
Google Search Central — Changes to HowTo and FAQ rich results (2023-08-08)
Google Search Central — AI features and your website · Top ways to ensure your content performs well in Google’s AI experiences (2025-05-21)
Google Search Central — Speakable structured data (beta)

Vocabulary:

Schema.org — Organization, Person, sameAs, FAQPage, HowTo, Article, speakable

Independent / industry:

searchVIU — Schema Markup and AI in 2025: What ChatGPT, Claude, Perplexity & Gemini Really See (2025-12-02)
Search Engine Land — How schema markup fits into AI search — without the hype (2026-03-25)
Search Engine Roundtable — ChatGPT & Perplexity Treat Structured Data As Text On A Page [observation] (2026-02-03)

Academic (boundary reference — schema not a tested variable):

Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K. & Deshpande, A. (2024). GEO: Generative Engine Optimization. KDD ‘24. arXiv:2311.09735 · paper summary

Frequently asked questions

Does Schema.org markup get my content cited by AI?

No — not directly. Markup is not a ranking or citation signal. It does two things: it makes a page reliably parseable, and it disambiguates the entity behind it (who/what you are) so the model can resolve you against its knowledge graph. Whether a passage is then lifted into an answer is decided by citability (its structure) and E-E-A-T (its source trust) at the grounding gate — schema does not act there. The honest model: markup makes an entity resolvable, not a passage liftable.

Do ChatGPT and Perplexity read my JSON-LD?

Not as structured data, at answer time. A controlled 2025 test (searchVIU) placed a price only inside JSON-LD and queried five systems; none of the live-fetch chatbots extracted it. Independent observation found ChatGPT and Perplexity will even surface values from invalid, made-up schema — meaning they read the markup as plain text on the page, not as a parsed graph. The entity benefit still reaches them, but through the model's prior and the knowledge graph, not by parsing the JSON-LD on your page during the fetch.

Which schema types matter most for AI?

Organization and Person — because they carry sameAs, the join key that resolves your entity into the knowledge graph, which is the part AI actually consumes. Article gives the page a clean type and authorship. FAQPage and HowTo declare answer shape a parser can already see, but they do not make those answers citable and their Google rich results were curtailed in 2023. Prioritise the entity primitives over the answer-shape ones.

Is FAQPage or HowTo schema still worth adding?

For AI, only marginally, and not for the rich result. Google restricted FAQ rich results to authoritative government and health sites and removed HowTo rich results entirely in 2023 — so the SERP payoff is largely gone. The markup still validly describes structure, but it does not make the underlying answers liftable; that is citability's job, done in the visible content. Add it if it is cheap and accurate; do not expect it to move AI citation on its own.

Can schema markup hurt me?

Yes. Markup that does not match the visible page is the main failure mode: Google issues structured-data manual actions that strip rich-result eligibility, and AI systems that read markup as page text will see the contradiction directly. Fabricated Organization or Person markup fails sameAs and knowledge-graph corroboration the same way fake authorship fails E-E-A-T. Invalid or content-mismatched schema is worse than none.

Sources