Concept · Signals

Multilingual GEO

Quick facts

What it is: GEO applied across a language boundary — the study of which steps of the answer loop are language-invariant and which are not, and what to do about the ones that are not
The four axes: Source pool, entity binding, chunk shape, trust pool — all four vary across languages; the GEO loop's shape (retrieve → ground → synthesize → attribute) does not
The load-bearing fact: Per-language source pools — English and Chinese AI engines retrieve from radically different webs, and appearing in one is not evidence you appear in the other
The hreflang myth: Hreflang is hygiene, not the mechanism — a site with perfect hreflang and zero zh-pool presence will not be cited in zh; a site with broken hreflang but strong zh-pool presence will be
The honest bound: No published rigorous benchmark (as of 2026-05) of zh-engine vs en-engine citation-preference deltas — read direction, not coefficient

1. What multilingual GEO is

Multilingual GEO is the GEO discipline applied across a language boundary — the study of which steps of the answer loop are language-invariant and which are not, and what to do about the ones that are not.

Definition (GEO Wiki working definition): multilingual GEO is the practice of optimizing for AI engines that retrieve, ground, and cite from a per-language slice of the web, with the recognition that four mechanisms — source pool, cross-language entity binding, chunk citability, and trust corroboration — vary with language even though the underlying answer-loop shape does not.

The four mechanisms that vary, in one phrase each:

Source pool — retrieval is per-language: the corpus an engine pulls from is selected by the query’s language, not unified across languages.
Entity binding — your brand has a surface form in each language but a single canonical entity; whether the engine joins surface to canonical across languages is its own gate.
Chunk shape — citability heuristics (paragraph density, sentence length, punctuation, structure) were calibrated on language-specific corpora; English-trained extraction does not transfer cleanly to Chinese, and vice versa.
Trust pool — what counts as an authoritative corroborating source differs across language pools; E-E-A-T is a stable concept, the observable pattern that satisfies it is not.

Language is a routing dimension inside each step of the generative engine optimization loop, not a translation problem layered on top of an English-default workflow. The framework below is general; the worked examples (zh ↔ en) are the pair the field has the most live evidence for.

2. The four axes — at a glance

The table this entry’s mechanism stories hang off. Each row is a separate § below.

Axis	Language-invariant claim	Language-variant fact	See
Source pool	Retrieval is from a corpus, not a global pool	Which corpus depends on the query’s language — engines retrieve a per-language web slice	§3
Entity binding	Credit attaches to a canonical entity, not a string	The surface form differs across languages; whether the engine joins surface to canonical across the boundary is its own gate	§4
Chunk extraction	Citable content is structured, scannable, attributable	The extraction heuristics calibrated on English punctuation and structure do not transfer cleanly to Chinese	§5.1
Attribution form	Citations encode “where the claim came from”	The form of attribution (numbered footnote vs in-flow source name) is language-specific	§5.2
Trust pool	E-E-A-T is corroborated, not asserted	The pool of corroborating sources is per-language; authority cues differ	§5.3

The rest of this entry walks each axis once, starting with the structural one.

3. Per-language source pools — and the engines that draw from them

An AI engine builds its answer from a retrieved sentinel set of sources, and that set is selected from the language-matched slice of the web — not from a global corpus that languages share. This is the headline structural fact: appearing in an en answer is not evidence you appear in a zh answer, and vice versa, because they are different retrieval events over different webs.

                  "best CRM for SMB"
                        │
            ┌───────────┴───────────┐
            ▼                       ▼
      en query path           zh query path
            │                       │
       retrieves from          retrieves from
       en web slice            zh web slice
       (Wikipedia-en,          (Wikipedia-zh, Baidu
       vendor docs,            Baike, Zhihu, vendor
       SE Land, Reddit,        cn docs, Weixin 公众号
       G2, Capterra…)          articles, 36kr, …)
            │                       │
            ▼                       ▼
       en answer               zh answer
       (different sources, often different conclusions)

The two pools are not a translation of each other. Wikipedia-zh runs at roughly a fifth of Wikipedia-en’s article count; Baidu Baike is significantly larger than Wikipedia-zh inside China but invisible to Western engines; Zhihu and WeChat 公众号 content are primary discovery surfaces with no exact Western analogue; vendor documentation translated into zh often lags or is missing. The same query against the same brand produces, in practice, two different SERPs and two different answers.

The engines that read each pool, with what crawlers they admit:

Engine	Pool	Operator	Crawler UA(s) to admit
Google AI Overviews	en-dominant, multilingual via Google Search	Google	`Googlebot`, `Google-Extended`
ChatGPT search	en-dominant, multilingual retrieval	OpenAI	`GPTBot`, `OAI-SearchBot`, `ChatGPT-User`
Perplexity	en-dominant, multilingual retrieval	Perplexity	`PerplexityBot`, `Perplexity-User`
Google Gemini	en-dominant, multilingual via Search	Google	`Google-Extended`
Baidu AI Search	zh, integrated with Baidu index	Baidu	`Baiduspider`
Qianwen (formerly Tongyi)	zh, Qwen-backed	Alibaba	Not publicly documented as a web-crawler UA
Doubao	zh, ByteDance ecosystem	ByteDance	`Bytespider` (documented via Volcengine)
Yuanbao	zh, Hunyuan + WeChat 公众号 corpus	Tencent	Not publicly documented
DeepSeek	zh+en, API-served chat	DeepSeek	No public web-crawler UA — model is API-served, not an indexing engine

A few load-bearing details to read carefully. Yuanbao retrieves from WeChat 公众号 content — a corpus no Western engine can reach, and the cleanest single example of the source-pool asymmetry. Baiduspider self-documents at www.baidu.com/search/spider.html and respects robots.txt directives; the other Chinese vendors are less transparent. DeepSeek’s product is a chat surface backed by API-served models, not a web retriever — there is no DeepSeek crawler to admit, and being “cited by DeepSeek” usually means being in the model’s training data or attached via a third-party retrieval layer, not retrieved live.

The 2025 Chinese AI-search market is structurally different from the West — practitioner coverage in 36kr (Feb 2025) tracks 230M+ Chinese AI users and a three-way ecosystem split between traditional Baidu/Google search, social search dominated by Xiaohongshu (~600M daily searches inside the app), and AI-native assistants. The Xiaohongshu shape, in particular, has no exact Western analogue — discovery happens inside a social-content app, not at a search box.

Blocking GPTBot does not block Baiduspider; blocking Baiduspider does not block GPTBot. Each pool is gated by its own crawlers, and absence from one pool is invisible from the other pool’s tooling — your en-side rank tracker will not see your zh-side coverage gap.

4. Cross-language entity binding

A brand has a surface form in each language but a single canonical entity. Whether the engine joins surface to canonical across the boundary is its own gate — the harder version of the resolution problem entity recognition owns for the single-language case.

The mental model: one canonical entity, multiple surface forms across languages. The Wikidata Q-id is the language-agnostic spine; the language-specific Wikipedia pages (en-Wikipedia, zh-Wikipedia, etc.) are surface attestations of the same node. Wikidata centralizes all interlanguage links between Wikipedia articles about the same topic since 2013, with labels and descriptions stored in any number of languages and a fallback chain when a label is missing in the requested one (Wikidata Help:Multilingual).

The explicit join key for cross-language resolution is sameAs pointing at multiple language Wikipedias plus the Wikidata URI. Schema.org’s own definition is exactly this shape — sameAs is the “URL of a reference Web page that unambiguously indicates the item’s identity. E.g. the URL of the item’s Wikipedia page, Wikidata entry, or official website” (schema.org/sameAs). For the JSON-LD pattern, see Schema.org for AI; the cross-language case is the hardest test of the join — multiple surface forms must collapse onto one node without any of them contradicting the others.

The failure shapes, ordered by frequency:

Failure	Why it splits identity	Fix shape
Romanization / transliteration drift	The brand appears as three different zh strings — 拼音 (“Aikemi”), 译名 (“艾克米”), 商标译名 (“亚克美”) — none of which an engine knows refer to the same node as the en “Acme”	Pick one canonical zh rendering, use it consistently in every zh-pool channel, and bridge to the en form with `sameAs` on both sides
No zh-language presence at all	Only the en surface form exists; a zh query has no zh-side string to land on, let alone resolve	Earn a zh-language attestation chain — a Wikipedia-zh page, a Baidu Baike entry, vendor docs in zh, mention by zh-pool authorities — before optimizing markup
A legitimately different en and zh brand (localized product name) with no `sameAs` bridge	The engine treats them as two unrelated entities and splits the prior across two nodes	Add `sameAs` on both pages and to Wikidata; corroborate the bridge with editorial / press coverage that names both forms together
Mention pool that never overlaps	en and zh mentions both exist but never co-occur with the join key in any single source	Engineer at least one well-attested artefact — a Wikidata entry, a bilingual press release, a Wikipedia page — where both surface forms are stated together

The honest bound, the cross-language counterpart of entity recognition §6’s long-tail finding: resolvability rises with how widely an entity is attested in BOTH language pools. Cross-lingual entity linking research treats Wikidata as the spine and successfully links mentions in 100+ languages to a shared knowledge base of ~20M entities (Botha, Shan & Gillick, EMNLP 2020) — but its measured success is on Wikipedia abstracts, not long-tail brand resolution on the open web. A famous global brand resolves reliably across the boundary; a long-tail brand often does not, even with sameAs, because there is nothing on either side for the join to corroborate against.

5. Citability, attribution density, and trust signals are not language-invariant

Three concerns the field treats as language-invariant but is not: how a passage extracts, how attribution reads, and what counts as a trusted source. The shared mechanism is the same — the heuristics that calibrate each were trained or tuned on a specific language’s web. The concepts hold; the observable pattern that satisfies each in a given language does not transfer.

5.1 Chunk shape and citability

The chunk-extraction heuristics behind citability — paragraph length, sentence density, header cadence, scannability — were calibrated on English-dominant corpora. Chinese chunk shape differs structurally:

Dimension	English norm	Chinese norm	Effect on en-trained extraction
Sentence length	Short, period-terminated	Longer, comma-chained (“流水句”)	zh sentences register as paragraph-length, fail per-sentence-quotability heuristics
Punctuation	ASCII `.` `,` `;`	Fullwidth `。` `，` `；`, no spaces	Character-span and tokenization assumptions break; chunk boundaries mis-detected
Paragraph density	One idea per paragraph is normative	Higher idea density per paragraph is normative	zh paragraphs read as “denser” and may fail short-passage extraction even when they are highly readable
Inline structure	Bullet lists, sub-headers, frequent whitespace	Prose-first, structure embedded inline	en heuristics under-weight a well-organized zh page that omits Western scaffolding

The fix is not “mimic English structure” — that produces stiff, AI-flavored zh that reads worse to humans without reliably reading better to machines. The fix is explicit structural scaffolding inside zh idiom: real H3 anchors, a one-sentence lead per paragraph, summary tables, key claims surfaced before the supporting prose. Western scaffolding inside native zh rhythm, not the other way around.

5.2 Attribution density and citation form

Western practitioner content typically marks attribution with numbered inline citations or footnote-style superscripts ([1], [2]). Chinese practitioner content frequently inlines the source name in prose — “据 IDC 报告” (“according to IDC’s report”), “Gartner 数据显示” (“Gartner data show”) — without any numbered cite at all. The argument citation vs mention makes about the link between named-source density and citability holds in both languages, but the form that satisfies it differs.

An engine calibrated on the numbered-citation pattern may under-count zh attributions even when they are present and well-formed. A zh content strategy that ports English’s footnote habit verbatim into zh tends to read as unnatural; one that uses idiomatic in-flow naming still satisfies the citability gate, just on the zh side.

5.3 Trust pool and corroboration

E-E-A-T’s pool of “what counts as an authoritative source” is per-language. A short comparison:

Tier	English-pool examples	Chinese-pool examples
Encyclopedia / KB	Wikipedia-en, Wikidata	Wikipedia-zh, Baidu Baike, Wikidata
Quality press	NYT, FT, The Economist, WSJ	Caixin / 财新, 第一财经, 南方周末
Vertical authority	TechCrunch, Stratechery, MIT Tech Review	36氪, 虎嗅, IT 之家
Practitioner long-form	Substack, vertical blogs, Hacker News discussion	Zhihu, WeChat 公众号 long-form
Government / official	gov.uk, ec.europa.eu, .gov	gov.cn, ministry sites, 央视网

Authority cues differ. Government-source weight is higher inside the zh pool than the en pool. Platform-native weight (Zhihu, WeChat 公众号, Xiaohongshu posts) is much higher inside the zh pool because the public-square-via-platform pattern dominates Chinese discourse. KOL weight in a vertical can outweigh institutional weight inside the zh pool in ways less common inside the en pool. Trust corroboration is per-language, not because trust is a different concept across languages, but because the stock of trusted attestors is.

Citability and trust are stable concepts; the observable pattern that satisfies each in a given language is not.

6. The technical layer — hreflang, URL i18n, and regional crawlers

Hreflang is hygiene; per-language source pools are the mechanism. This § exists to keep that order honest.

Surface	What it does	GEO weight
`hreflang` annotations	Routes language and region variants to the right SERP slot in traditional Google Search; Google explicitly states it does not use hreflang for language detection — its own algorithms do (Google Search Central)	Necessary on multilingual sites for traditional-search clarity. Weaker on pure-LLM surfaces because HTML head metadata and JSON-LD are not parsed as a graph at answer time (see Schema.org for AI §5). Hygiene-grade, not lever-grade
URL i18n shape — subfolder (`/zh/...`), subdomain (`zh.example.com`), or ccTLD (`example.cn`)	Determines crawler access boundaries, hosting/CDN region, and how link equity consolidates between language variants	The choice has SEO and operational consequences (a `.cn` site usually needs an ICP filing; a subdomain is treated as a separate property for some signals). For GEO, the larger consequence is which language pool you end up in by default, not the URL shape itself. Subfolder is the modal recommendation
Regional crawler accessibility	Whether each pool’s gating crawlers can reach you at all — GPTBot, PerplexityBot, Google-Extended on the Western side; Baiduspider, Bytespider on the Chinese side; firewall / CDN region / robots.txt per-UA on each	Load-bearing: no crawl, no retrieval. A site invisible to Baiduspider has zero zh-pool presence regardless of how good its zh content is. The first check on any multilingual GEO audit

A site with perfect hreflang and zero presence in the zh source pool will not be cited in zh. A site with broken hreflang but a strong zh-pool attestation chain will be. The technical layer matters because crawler access is its own gate; hreflang matters because traditional search still mediates a meaningful share of AI engines’ retrieval. Neither is the mechanism — the mechanism is which sources land in the retrieved set.

For the per-language file alongside robots.txt, see llms.txt; for the broader crawler-access discipline, see AI Crawlers.

7. What the evidence says — and what it does not

The mechanism direction (pools differ, binding fragments, extraction varies) is well-attested. The brand-level coefficient — how much a zh-engine cites differently from an en-engine, for what kinds of sources, by what factor — is not rigorously published.

What holds	The bounded reading
Multilingual LLMs systematically over-select English sources in retrieval-augmented generation — high-resource languages dominate monolingual knowledge extraction, and English benefits from a structural selection bias in cross-lingual knowledge selection (Wu et al., arXiv:2410.21970)	Measured on RAG knowledge-selection benchmarks, not on deployed-product brand citation behavior. The direction (English bias is structural, not stylistic) generalizes; a specific en-vs-zh citation coefficient for your brand does not port from this
Cross-lingual entity linking uses Wikidata Q-ids as a language-agnostic backbone — dual-encoder models successfully link mentions in 100+ languages to a single KB of ~20M entities (Botha, Shan & Gillick, EMNLP 2020)	Measured on Wikipedia abstracts and a curated multilingual benchmark, not on long-tail brand resolution on the open web. The mechanism (Wikidata is the spine; `sameAs` is the explicit join) transfers; the success rate on a specific long-tail brand is bounded by attestation in both language pools, not by markup alone
GEO’s headline +40% lift was measured on English — Aggarwal et al. tested rewritten English content against English queries on an internal engine and Perplexity (paper summary · arXiv:2311.09735)	The number does not transfer cross-lingually. The direction (content substance — citations, statistics, quotations — beats keyword tricks) likely generalizes; the +40% is a per-method, per-domain upper bound on English in 2023–24 engines
The Chinese AI-search ecosystem is structurally different — three-way split between traditional Baidu/Google, social Xiaohongshu, and AI-native assistants, with ~230M Chinese AI users and Xiaohongshu doing ~600M in-app daily searches (36kr, Feb 2025)	Practitioner / market corroboration that the source pool differs in shape, not just in language. Not measurement of a citation-preference delta on a specific brand
Translation-only localization is increasingly failing under AI mediation — AI engines retrieve and normalize content across languages before ranking; freshness-driven dominance lets faster-updating markets override others globally (Search Engine Land, Hunt, Jan 2026)	Western practitioner corroboration of the structural claim. Direction over coefficient

The honest gap, stated plain: there is no published rigorous benchmark, as of 2026-05, of zh-engine vs en-engine citation-preference deltas at the brand level. No public study systematically compares ChatGPT, Perplexity, or Gemini answers against Doubao, Yuanbao, or Baidu AI Search on matched-intent queries for the same brand in each engine’s dominant language. Anyone who quotes a precise percentage for “zh engines cite X% more brand-named sources than en engines” is over-claiming. Read the direction (the four axes really do vary), build for it, and resist importing en coefficients into zh decisions.

8. Anti-patterns — multilingual misreads

Misread	Why it looks right	Why it’s wrong
”Just translate the en entry and you’re done”	Translation looks like the action	Fragments entity binding (no `sameAs` bridge, no zh-side attestation), ignores per-language source pool, and a literal en→zh translation that imports Western paragraph rhythm reads as stiff zh and chunks worse than well-written native zh
”Sub-folder vs subdomain vs ccTLD is the load-bearing decision”	Looks technical and decisive	It is hygiene with operational consequences; the source-pool argument applies regardless. A perfectly-shaped subfolder structure with no zh-pool attestation is still zero zh presence
”Hreflang fixes multilingual GEO”	Hreflang is the most googleable multilingual-SEO surface	Hreflang is for traditional search routing, and Google states it does not even use hreflang for language detection. On pure-LLM surfaces, head-metadata-as-graph is not parsed at answer time — the join is the source pool and the entity binding, not the head tag
”If Baidu, Tongyi, or Doubao don’t expose a SERP API, optimization is impossible”	Looks like a measurement ceiling	The source-pool argument tells you what to optimize regardless of measurement tooling. Earn zh-side attestation, fix entity binding, scaffold for zh chunk shape — the inputs are the same whether you can measure the output or not
”One brand name, used in en everywhere, will resolve in zh”	Looks consistent	Without zh-side attestation, a zh query has nothing to resolve to. `sameAs` is a claim; resolution is corroborated, not asserted
”My en page ranking in Google AIO means my zh page will rank in Qianwen”	Looks transitive	They are disjoint retrieval events over disjoint pools, against different engines, in different jurisdictions. Cross-pool transitivity is not a default — it has to be earned, pool by pool

The failure mode is rarely “we forgot to translate.” It is treating one of the four axes from §2 as language-invariant when it is not.

9. Why this matters for GEO + how to act

Multilingual GEO is not a separate discipline — it is GEO done once per language pool, with the four axes from §2 as the per-pool audit checklist.

Your intent	First stop
Audit per-language source-pool presence	GEO Audit playbook (sharded by language)
Make zh pages structurally extractable	Citability playbook (when shipped); read §5.1 above for the pattern
Bind brand identity across languages	Entity Recognition + the Schema Implementation playbook
Earn zh-pool mentions	Brand Mentions, read for the per-language mention-pool reading
Get the structured node (Wikidata, language Wikipedias)	Knowledge Graph Presence
Calibrate the en-pool trust signals	E-E-A-T
See which engines draw from which pool	§3 above + Generative Engine for the per-engine picture
Place this in the loop	Answer Loop
The method that ties it together	Generative Engine Optimization

The practical reading: run the four-axis audit for each language pool you actually serve. Most teams discover that they have one pool well-resourced and another pool either invisible or actively miscalibrated — usually because they assumed translation plus hreflang covered the rest.

References

Academic:

Wu, S., Tang, S., Yang, J., Wang, S., Jia, R., Yu, S., Yao, S. & Su, J. (2024). Not All Languages Are Equal: Insights into Multilingual Retrieval-Augmented Generation. arXiv:2410.21970
Botha, J. A., Shan, Z. & Gillick, D. (2020). Entity Linking in 100 Languages. EMNLP 2020. ACL Anthology
Aggarwal, P. et al. (2024). GEO: Generative Engine Optimization. KDD ‘24. arXiv:2311.09735 · paper summary — boundary reference; English-only measurement

Official / standards:

Schema.org — sameAs — “URL of a reference Web page that unambiguously indicates the item’s identity”
Google Search Central — Tell Google about localized versions of your page — hreflang annotation methods and the explicit “Google does not use hreflang for language detection” note
Wikidata — Help:Multilingual — multilingual labels, fallback chains, and centralized interlanguage links

Chinese AI engines (official surfaces):

Baidu — chat.baidu.com (consumer AI search) · Qianfan Intelligent Search Generation API
Alibaba — Qianwen (formerly Tongyi; Qwen model family)
ByteDance — Doubao
Tencent — Yuanbao (Hunyuan, integrates WeChat 公众号 content as citation source)
DeepSeek — deepseek.com (API-served; no public web-crawler UA)

Industry:

Hunt, M. (2026-01-21). International SEO in 2026: What still works, what no longer does, and why. Search Engine Land
36氪 (2025-02-11). 2025 搜索之战愈演愈乱：从新旧王朝到三’族’鼎立. 36kr.com

Frequently asked questions

Isn't multilingual GEO just hreflang plus translation?

No. Hreflang routes the right page to the right SERP slot — that is traditional-search hygiene, and Google states it does not even use hreflang for language detection in the first place (its own algorithms do). Translation produces a zh string from an en source — that is also hygiene. The load-bearing fact is that English and Chinese AI engines retrieve from different webs, and your en presence does not transfer to your zh presence. Source-pool work — earning attestation inside the zh web (Wikipedia-zh, Baidu Baike, Zhihu, vertical-Chinese venues) — is what multilingual GEO actually requires.

Does sameAs across language Wikipedias resolve my brand globally?

Resolution is corroborated, not asserted — the same bound Entity Recognition states for the single-language case. sameAs from your en page to en-Wikipedia, zh-Wikipedia, and the Wikidata Q-id is the strongest explicit join key, but the join survives only when the rest of the web does not contradict it. A long-tail brand with no zh-side attestation cannot be resolved in zh by sameAs alone — there is nothing for sameAs to confirm against. Cross-lingual entity linking research (Botha et al., EMNLP 2020) treats the Wikidata Q-id as the language-agnostic backbone, but its measured success rises with how widely an entity is attested in BOTH language pools, not in one.

Will translating my best English entries fix this?

Partly, and only the easiest part. A high-quality zh translation makes you readable in the zh pool — that is the floor. It does not make you cited there. Being cited requires the same downstream work English needed: a per-language mention pool, links from zh-pool authorities, Wikipedia-zh or Baidu Baike attestation. A translated page with no zh-side mention pool sits in the index but rarely makes the retrieved set. Worse: a literal en→zh translation that ignores zh chunk shape and idiom often underperforms even basic citability — the page reads stiff to both readers and extraction heuristics.

Which Chinese AI engines actually matter for GEO right now?

The consumer surfaces with the largest measured user bases as of 2025–2026 are Baidu's AI Search (ERNIE-backed, integrated with Baidu's index), Alibaba's Qianwen (formerly Tongyi, Qwen model family), ByteDance's Doubao, Tencent's Yuanbao (Hunyuan-based, cites WeChat 公众号 content), and DeepSeek's chat product. Industry coverage (36kr, Feb 2025) tracks 230M+ Chinese AI search users and a three-way ecosystem split — traditional search (Baidu/Google), social search (Xiaohongshu ~600M daily searches), and AI-native assistants — a shape with no exact Western parallel. Yuanbao's citation of WeChat-internal content is the clearest example of the source-pool asymmetry: no Western engine retrieves from that walled garden.

Where is the rigorous data on en-vs-zh AI-search citation behavior?

There is no published rigorous benchmark of zh-engine vs en-engine citation-preference deltas as of 2026-05. The closest published work is Wu et al. (arXiv:2410.21970, 2024) on multilingual retrieval-augmented generation, which shows high-resource languages dominate knowledge selection and English benefits from a structural selection bias in multilingual RAG — direction, not a brand-citation coefficient. Anyone who quotes a precise number for 'zh AI engines cite X% more brand-named sources than en engines do' is over-claiming. The structural fact (pools differ) is well-attested by observation; the coefficient is not.

Sources

Primary

GEO: Generative Engine Optimization (Aggarwal et al., KDD '24) · arXiv / ACM SIGKDD · 2024-08-25
Not All Languages Are Equal: Insights into Multilingual Retrieval-Augmented Generation (Wu et al., 2024) · arXiv · 2024-10-29
Entity Linking in 100 Languages (Botha, Shan & Gillick, EMNLP 2020) · ACL Anthology / EMNLP 2020 · 2020-11-16
sameAs — Schema.org Property · Schema.org
Tell Google about localized versions of your page (hreflang) · Google Search Central
Help:Multilingual · Wikidata / Wikimedia Foundation
Baidu AI Search (chat.baidu.com) · Baidu, Inc.
Qianfan — Intelligent Search Generation API reference · Baidu Cloud (Qianfan)
Qianwen (Alibaba AI Assistant, formerly Tongyi) · Alibaba
DeepSeek — official site · DeepSeek
Doubao — ByteDance AI assistant · ByteDance
Yuanbao — Tencent AI assistant · Tencent

Secondary

International SEO in 2026: What still works, what no longer does, and why · Search Engine Land (Motoko Hunt)
2025 搜索之战愈演愈乱：从新旧王朝到三'族'鼎立 · 36氪