Written on 17/3/2026
Updated on 19/3/2026

XML Feed: sitemap, RSS feed, and crawl signals for search engines

Definition

An XML feed is an XML-formatted file that lists or describes content in a machine-readable way. In SEO, we distinguish the XML sitemap (list of pages to index) from the RSS/Atom feed (new content stream). In 2026, sitemaps are also read by AI robots to discover your content.

What is an XML feed in SEO?

An XML feed is a file in XML (eXtensible Markup Language) format that structures data in a standardized, machine-readable format. In SEO, two types of XML feeds are used: the XML sitemap (list of URLs to index, with metadata like modification date and priority) and the RSS or Atom feed (stream of newly published content, primarily used for blogs and podcasts). Both serve as crawl signals for search engines.

XML sitemap in 2026: more than a Google signal

The XML sitemap remains one of the most important technical tools for indexing. It allows you to signal to Google which pages you want indexed, their last modification date, and their organization. Submitted via Google Search Console, it accelerates discovery and recrawling of new pages. In 2026, the sitemap also plays a growing role for LLM robots: GPTBot, PerplexityBot, and ClaudeBot read sitemaps to discover available content on a site before crawling it. A well-maintained sitemap is therefore also a GEO signal.

What we observe at Vydera on problematic sitemaps

The most frequent sitemap errors in audits: sitemaps that include noindex or redirected URLs (contradiction between what you tell Google to index and what you're blocking), sitemaps not updated after migrations or page deletions (Google keeps crawling non-existent URLs), and sitemaps simply never submitted in Search Console. In the last case, page discovery depends entirely on internal linking.

XML sitemap best practices

  • Only include canonical, indexed, non-redirected pages.
  • Automatically update the lastmod date with each significant content modification.
  • Use indexed sitemaps if your site exceeds 50,000 URLs (one file per content type: pages, articles, products).
  • Submit the sitemap in Google Search Console and Bing Webmaster Tools.
  • Reference the sitemap in the robots.txt file via the line Sitemap: https://yoursite.com/sitemap.xml.

Sources and references

Go further

Sitemap verification is part of every technical audit. Find our analyses on Vydera Lab or contact us for a technical configuration audit.

The XML sitemap is a complete list of your site's URLs intended for search engines: it tells them which pages exist and when they were modified. The RSS feed is a chronological stream of the latest published content, primarily intended for aggregators and feed readers. Both are XML, but their use and audience differ. Search engines primarily use the XML sitemap for indexing.

As many as needed. A single sitemap can contain up to 50,000 URLs and must not exceed 50MB. Beyond that, you need a sitemap index: an XML file listing multiple secondary sitemaps (one per content type for example). There's no limit to the number of sitemaps a site can have, but each must be referenced in the sitemap index and submitted in Google Search Console.

Yes. AI system robots like GPTBot (OpenAI), PerplexityBot, and ClaudeBot (Anthropic) read XML sitemaps to discover available content on a site. A well-maintained sitemap, referenced in robots.txt, facilitates the discovery and crawling of your content by these robots, contributing to your visibility in LLMs' generative responses.

No. The sitemap should include only pages you want indexed: canonical pages, without noindex, without redirects. Including noindex pages, 301s, 404s, or duplicate pages in the sitemap creates contradictions that Google interprets negatively. The sitemap is a priority recommendation, not an exhaustive inventory: only put in it what you genuinely want Google to crawl and index first.