Entity Recognition
Quick facts
- What it is
- The layer that maps a name on your page (brand, product, author) to a canonical entity the engine already knows — and disambiguates it from collisions
- Why it matters
- It is the join. A mention, citation, or sameAs only pays out if it resolves to the right node — unresolved credit leaks, mis-resolves, or is dropped
- Where it acts
- The pre-retrieval parse + entity layer — upstream of the grounding/selection gate Citability and E-E-A-T govern. It decides whose ledger credit lands in, not which passage is lifted
- The join key
- sameAs is the explicit resolution edge — Google states it is 'used behind the scenes to disambiguate your organization from other organizations'
- The honest bound
- Resolution is corroborated, not asserted — one sameAs does not resolve you; one consistent identity attested across sources does
1. What entity recognition is
Entity recognition is the layer that decides whether an AI engine can tell who or what you are — and join the signals you have earned to the right node. This entry treats it as the GEO-relevant subset of the academic named-entity recognition / entity linking literature, not as the literature itself.
Definition (GEO Wiki working definition): entity recognition, as a GEO signal, is the process by which an AI engine maps a surface form — a name, brand, product, or author string — to a canonical entity it already knows, disambiguating it from collisions, so that any credit attached to that string accrues to the right node.
2. Why an unresolved entity cannot be credited — the load-bearing honesty
This entry’s counterpart to Brand Mentions §5’s “a link with no mention is not a completed authority play” and Citability §2’s “necessary, not sufficient”: a mention, a citation, or a markup assertion only pays out if the engine resolves it to the right node. Resolution is the join, not a nicety. Being named or cited but unresolved is credit that leaks.
mention / citation / sameAs assertion
│
▼
[ ENTITY RESOLUTION LAYER ] ◄── this entry
surface form → candidate → disambiguate → node
│
┌────────┴─────────┐
▼ ▼
resolved UNRESOLVED / MIS-RESOLVED
→ prior & → credit dropped, or
credit attached to the wrong
accrue to entity (name collision)
your node
Where it acts is the whole point. Resolution operates at the pre-retrieval parse and entity layer — the same layer Schema.org for AI §2 places markup at, upstream of the Answer Loop §3 grounding/selection gate that Citability and E-E-A-T govern. Resolution does not pick the passage; it decides whose ledger the credit lands in.
Three orthogonality lines, stated so they cannot be misread:
- Resolution ≠ liftability. Making a passage quotable is Citability’s.
- Resolution ≠ trust. Whether the resolved entity is trusted is E-E-A-T’s.
- Resolution ≠ the node existing. Having a Wikidata node is Knowledge Graph Presence’s; being matched to it is here.
3. The mechanism — the resolution pipeline
Under the hood this is named-entity recognition followed by entity linking — named once as the underlying discipline, then never jargon-walked again. In GEO terms it is one pipeline:
surface form "Acme" (the string on a page or in a query)
│
▼
candidate generation which known entities could "Acme" be?
│ (Acme Corp · Acme Tools · Acme the band …)
▼
disambiguation context + co-occurrence + prior pick one
│
▼
canonical entity the node credit attaches to
Disambiguation draws on three inputs. They are not equal, and which one dominates is what §5 routes by surface.
| Input | What it supplies | Where it comes from | When it dominates |
|---|---|---|---|
| ① Explicit join keys | An unambiguous edge to the node | sameAs, structured identifiers, authoritative profile URLs | Index-integrated surfaces that parse structured data |
| ② Disambiguating context | Enough signal to collapse candidates to one | Consistent canonical name, NAP consistency, descriptive co-occurrence near every mention | When no explicit key is present (most of the open web) |
| ③ The model’s existing prior | A default pull toward the well-attested reading | How widely the entity is attested in training/retrieval (the §6 resolvability gradient) | Pure-LLM surfaces with no structured layer to read |
Input ① is the mechanism Schema.org for AI §4 routes here: sameAs is the resolution join key — Google states it is “used behind the scenes to disambiguate your organization from other organizations” (see Organization structured data). Which markup carries the sameAs key is in Schema.org for AI §4; how the join resolves is below. For the JSON-LD block, see Schema.org for AI or the Schema Implementation playbook. In prose, sameAs simply asserts “the entity at this page is the one at this Wikipedia/Wikidata URL,” and the engine treats that as a candidate-collapsing edge only when the rest of the web does not contradict it.
4. The levers — what makes you resolvable (concept, not runbook)
A concept-level taxonomy of resolution levers, each tagged with the §3 input it feeds. This is not a runbook — the doing (deploy markup, reconcile identity, claim nodes) is the Schema Implementation playbook’s.
| Lever | How it aids resolution | §3 input fed | Failure shape |
|---|---|---|---|
sameAs / structured-identifier join key | Gives the engine an explicit edge to the node | ① | No explicit edge; entity never joined to the graph |
| One consistent canonical name + NAP across the web | Lets scattered mentions collapse to a single candidate | ② | Fragmented identity; candidates never collapse to one |
| Disambiguating context next to every mention (role, domain, descriptor) | Separates you from homonyms in-place | ② | Name collision unresolved; credit routed to the bigger entity |
| A claimed KG node as the resolution destination | Gives resolution something authoritative to resolve to | ①+③ | Nothing to resolve to (node mechanics → Knowledge Graph Presence) |
| Distinctive, collision-aware naming of brand/author/product | Shrinks the candidate set at the source | ② | A homonym swallows you in every candidate list |
The line that ties it back, the mirror of Brand Mentions §6’s “you earn being the named thing”: you do not declare an identity — you make one corroborated identity the only consistent reading across the web. Resolution is downstream of that consistency. Trust framing of the identity is E-E-A-T’s axis.
5. How resolution varies by surface (invariant vs delta)
The §3 pipeline is invariant — it holds everywhere. What varies is which input dominates.
| Surface | Dominant resolution input |
|---|---|
| Google AI Overviews / AI Mode | Index + Knowledge Graph — explicit-identifier (① ) driven, KG-backed |
| Google Gemini | Entity-graph-backed — co-occurrence + KG node (②+③) visible |
| ChatGPT / Perplexity (live fetch) | Model prior + retrieved on-page context (②+③); JSON-LD not parsed as a graph at answer time (Schema.org for AI §5) — the explicit join survives via the prior/KG, not by reading your markup during the fetch |
One routed line, not expanded: the same brand resolving across languages (zh ↔ en) is not re-derived here — cross-language entity binding is Multilingual GEO’s.
6. What the evidence says — and what it does not
The mechanism direction is well-attested; the brand-level dose-response is not. Read this table the way the site reads Aggarwal — for direction, not a coefficient.
| What holds | The bounded reading |
|---|---|
| Recall and handling of an entity rise sharply with how widely it is attested — popular entities resolve reliably, long-tail ones do not (Kandpal et al., arXiv:2211.08411; Mallen et al., ACL 2023) | These measure factual QA on Wikidata facts, with popularity proxied by Wikipedia pageviews — not brand entity resolution. The transfer is analogical, not direct, exactly the bound Brand Mentions §4 states for the same papers, read there for the prior, here for the resolvability gradient |
Index-integrated surfaces resolve via an explicit identifier/KG layer — Google states sameAs disambiguates your organization from others (Organization docs; the model dates to Knowledge Graph, 2012) | That is eligibility-grade resolution, not a ranking boost — the Schema.org for AI §6 line. |
| Industry practice now treats entity disambiguation (“entity drift,” “identity collapse”) as a first-order AI-search concern (Search Engine Land, 2026) | Practitioner corroboration that the signal is real — not independent proof of a mechanism or an effect size |
Honest gap: Generative Engine Optimization’s headline lever — Aggarwal et al. (KDD ‘24, arXiv:2311.09735; paper summary) — measured on-page content rewrites (cite sources, add statistics, quotations), not entity resolution. The up-to-40% figure does not transfer to “improve entity recognition.” Borrowing it here would be the exact over-claim the siblings warn against.
The position, the reciprocal of Schema.org for AI’s “declared, not rewarded” and E-E-A-T §6’s “earned, not annotated”: resolution is corroborated, not asserted — one sameAs does not resolve you; one consistent identity attested across sources does.
7. Anti-patterns — identity ambiguity and false joins
The errors this entry exists to prevent — mirror of Brand Mentions §8 and Schema.org for AI §7.
| Misread | Why it looks right | Why it’s wrong |
|---|---|---|
| ”We’re named everywhere, so we must be resolved” | Volume reads like authority | An unmanaged name collision splits the prior across homonyms — resolution, not volume, is the gate |
| ”A different name/handle per channel is fine for branding” | Looks like flexible marketing | Fragments the candidate set so it never collapses to one node (§3 input ②) |
“Fabricate sameAs to a famous node” | Looks like an instant join | Fails corroboration the same way fake authorship fails E-E-A-T §7 and fabricated Organization fails Schema.org for AI §7 — a false join is a detectable lie |
| ”Markup alone resolves us” | Looks sufficient | Pure-LLM surfaces do not parse JSON-LD at answer time (§5); the explicit key needs corroboration to survive |
| ”We have a Wikipedia page, so we’re resolved” | Looks like the finish line | The node existing is Knowledge Graph Presence’s; being matched to it from your mentions is the separate work this entry owns |
The load-bearing line: the failure mode is rarely “no identity” — it is the wrong one, or a split one. Consistency is the fix, not volume.
8. Why this matters for GEO + how to act
Resolution is one of the upstream joins that makes any downstream credit reachable at all — pair it with groundability (Citability) and the off-site prior (Brand Mentions). This entry is the concept; the doing is the playbook.
| Your intent | First stop |
|---|---|
Deploy sameAs / fix identity markup | Schema Implementation |
| Get the structured node itself | Knowledge Graph Presence |
| Earn the off-site mentions that feed resolution | Brand Mentions |
| Understand which markup feeds the entity layer | Schema.org for AI |
| Check the trust framing of the resolved entity | E-E-A-T |
| Resolve the same entity across languages | Multilingual GEO |
| See where this sits in the loop | Answer Loop |
| The method that ties it together | Generative Engine Optimization |
References
Academic:
- Kandpal, N., Deng, H., Roberts, A., Wallace, E. & Raffel, C. (2023). Large Language Models Struggle to Learn Long-Tail Knowledge. ICML 2023 (PMLR v202). arXiv:2211.08411
- Mallen, A., Asai, A., Zhong, V., Das, R., Khashabi, D. & Hajishirzi, H. (2023). When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. ACL 2023. ACL Anthology · arXiv:2212.10511
- Aggarwal, P. et al. (2024). GEO: Generative Engine Optimization. KDD ‘24. arXiv:2311.09735 · paper summary — boundary reference; entity resolution is not a tested variable
Official:
- Google — Introducing the Knowledge Graph: things, not strings (2012-05-16) — the “entities, not strings” model
- Google Search Central — Organization structured data —
sameAs“used behind the scenes to disambiguate your organization from other organizations” - Schema.org —
sameAs— “URL of a reference Web page that unambiguously indicates the item’s identity”
Industry:
- Search Engine Land — Why entity authority is the foundation of AI search visibility (Benu Aggarwal, 2026-03-16)
Frequently asked questions
Isn't entity recognition just an NLP detail — why does it matter for GEO?
How is this different from Knowledge Graph Presence and Brand Mentions?
Does adding sameAs markup guarantee my entity is resolved?
Why do well-known brands get resolved more reliably than mine?
What is the most common entity-recognition failure?
See also
Sources
Primary
- Large Language Models Struggle to Learn Long-Tail Knowledge (Kandpal, Deng, Roberts, Wallace & Raffel, ICML 2023) · arXiv / ICML 2023 (PMLR v202) · 2023-07-27
- When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories (Mallen et al., ACL 2023) · ACL 2023 (Long Papers) · 2023-07-02
- Introducing the Knowledge Graph: things, not strings · Google (Amit Singhal, The Keyword) · 2012-05-16
- Organization structured data (sameAs disambiguation) · Google Search Central · 2026-04-15
- sameAs — Schema.org property · Schema.org
- GEO: Generative Engine Optimization (Aggarwal et al., KDD '24) · arXiv / ACM SIGKDD · 2024-08-25
Secondary
- Why entity authority is the foundation of AI search visibility · Search Engine Land (Benu Aggarwal)