Skip to content

[questions/qa-when-lang-neg] Challenges for search engine crawling and indexing #794

@xfq

Description

@xfq

[source] (https://www.w3.org/International/questions/qa-when-lang-neg.en.html) [en]

One of the primary issues with language negotiation is how search engines discover and index web content. Search engine crawlers do not behave like typical users. This discrepancy creates challenges for websites that rely on the Accept-Language header to serve different language versions of a page under the same URL.

A major hurdle is that search engine crawlers do not consistently send an Accept-Language header when requesting a page. Googlebot, for instance, often crawls without this header, or may default to English. This means the crawler may only ever see the default language version of a page. Consequently, other language versions of the content may not be discovered, crawled, or indexed, making them invisible to users searching in those languages.

Also, presenting different content to search engines and users is a practice known as "cloaking", which can be penalized by search engines. While language negotiation is not inherently cloaking, if not implemented carefully, it can be misconstrued as such.

To circumvent these issues, search engines like Google explicitly recommends using separate URLs for each language version of a page. This approach, combined with the use of hreflang annotations, provides a clear signal to search engines about the different language variations of a page and their relationship to one another. hreflang helps search engines understand which language or regional version of a page to show to a user based on their language and location settings.

See also #666 and #405

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions