🗺️

Sitemap Finder & Checker

Auto-discover any site's XML sitemap, validate against the protocol, and find indexation gaps. Detects orphan URLs, missing lastmod values, and oversized sitemaps.

Sitemap Finder & Checker
Auto-discover sitemaps via robots.txt and common paths. Validates URL count, lastmod format, and reports any URLs that 404.
What it does

Find every sitemap on a site, and validate it actually works

About 40% of sites have a sitemap that Google can't use. Either it's in the wrong location, exceeds the 50,000 URL limit, has invalid lastmod values, or contains URLs that 404. The site owner doesn't know because Google quietly ignores broken sitemaps.

This checker auto-discovers sitemaps via robots.txt, common locations (/sitemap.xml, /sitemap_index.xml, /sitemap1.xml), and common patterns. It validates each one against the sitemaps.org protocol and reports problems with severity ranking.

Sitemap rules

The 5 sitemap rules Google enforces silently

  1. Max 50,000 URLs per file. More than that → split into multiple sitemaps with a sitemap-index.xml pointing to them.

  2. Max 50MB per file (uncompressed). If you have 50K URLs and they're hitting the size limit, your URLs are too long.

  3. URLs must be on the same domain as the sitemap (or use cross-domain sitemap submission via Search Console).

  4. lastmod must be a valid ISO 8601 date. Wrong format → Google ignores the field, doesn't re-crawl when you update content.

  5. URLs must be the canonical, indexable versions, no redirects, no noindex pages, no parameter URLs.

Sitemap strategy

How to use sitemaps strategically (not just submit and forget)

Split sitemaps by content type

Have one /sitemap-blog.xml, one /sitemap-products.xml, one /sitemap-pages.xml. Why? Search Console reports indexation per sitemap, splitting them means you can see which content type has indexation problems vs which is healthy.

Use accurate lastmod values

Don't set lastmod to today's date for every URL on every build (most CMSs do this by default). Google notices and starts ignoring lastmod entirely. Only update lastmod when content actually meaningfully changed.

Skip the priority and changefreq fields

Google publicly stated they ignore <priority> and <changefreq>. They're still in the spec but waste bytes. Skip them.

Include the sitemap in robots.txt

Sitemap: https://yoursite.com/sitemap-index.xml, one line, biggest indexation win. Many sites only submit via Search Console and miss this.

Frequently asked

FAQs about the Sitemap Finder & Checker

At /sitemap.xml or /sitemap_index.xml at the root of your primary domain. Subdomains need their own sitemap. Whatever you choose, declare it in robots.txt with a Sitemap: directive.
Below ~500 pages with strong internal linking, sitemaps add little. Above that, they meaningfully help Google discover and prioritise your pages. Always include one, there's no downside.
Yes, for image-heavy sites (e-commerce, photography, recipe sites). Use the image sitemap extension (xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"). Helps with Google Images traffic.
A sitemap lists URLs. A sitemap index lists OTHER sitemaps. Use a sitemap index when you have more than 50K URLs (split across multiple files) or when you want to organise by content type.
Whenever content changes. Most CMSs auto-regenerate on publish, that's ideal. Static-site builds should regenerate on every build.
Yes, sitemaps are for INDEXABLE, canonical pages only. Don't include 404s, redirects, noindex pages, or parameter URLs. The checker flags these as errors.
No. Google publicly confirmed they ignore both <priority> and <changefreq>. The lastmod field is the only optional field they actively use.
Beyond tools
Need an audit, not a checker?
These tools spot problems. I solve them. Book a free strategy call, I'll review your site live and give you a prioritised fix list.