Sitemap Finder & Checker
Auto-discover any site's XML sitemap, validate against the protocol, and find indexation gaps. Detects orphan URLs, missing lastmod values, and oversized sitemaps.
Find every sitemap on a site, and validate it actually works
About 40% of sites have a sitemap that Google can't use. Either it's in the wrong location, exceeds the 50,000 URL limit, has invalid lastmod values, or contains URLs that 404. The site owner doesn't know because Google quietly ignores broken sitemaps.
This checker auto-discovers sitemaps via robots.txt, common locations (/sitemap.xml, /sitemap_index.xml, /sitemap1.xml), and common patterns. It validates each one against the sitemaps.org protocol and reports problems with severity ranking.
The 5 sitemap rules Google enforces silently
Max 50,000 URLs per file. More than that → split into multiple sitemaps with a sitemap-index.xml pointing to them.
Max 50MB per file (uncompressed). If you have 50K URLs and they're hitting the size limit, your URLs are too long.
URLs must be on the same domain as the sitemap (or use cross-domain sitemap submission via Search Console).
lastmod must be a valid ISO 8601 date. Wrong format → Google ignores the field, doesn't re-crawl when you update content.
URLs must be the canonical, indexable versions, no redirects, no noindex pages, no parameter URLs.
How to use sitemaps strategically (not just submit and forget)
Split sitemaps by content type
Have one /sitemap-blog.xml, one /sitemap-products.xml, one /sitemap-pages.xml. Why? Search Console reports indexation per sitemap, splitting them means you can see which content type has indexation problems vs which is healthy.
Use accurate lastmod values
Don't set lastmod to today's date for every URL on every build (most CMSs do this by default). Google notices and starts ignoring lastmod entirely. Only update lastmod when content actually meaningfully changed.
Skip the priority and changefreq fields
Google publicly stated they ignore <priority> and <changefreq>. They're still in the spec but waste bytes. Skip them.
Include the sitemap in robots.txt
Sitemap: https://yoursite.com/sitemap-index.xml, one line, biggest indexation win. Many sites only submit via Search Console and miss this.