Loading...
Loading...
Ready
Enter a domain on the left and run the test. Results stream in here.
Paste your domain or any URL on it — the tool resolves the site root to locate robots.txt and sitemaps.
The tool reads robots.txt, discovers and follows your sitemaps, then tests every sitemap URL against the disallow rules.
See each sitemap URL that robots.txt would block, with the matching rule shown so you can fix the mixed signal.
It finds URLs that you have listed in your XML sitemap but that your own robots.txt blocks from being crawled. This is a self-defeating contradiction: a sitemap is you telling search engines please crawl these pages, while a robots.txt disallow tells them do not crawl this path. When both apply to the same URL, the disallow usually wins, so you are advertising pages you are simultaneously forbidding. The tool lists every such conflicting URL so you can resolve the mixed signal.
Search engines treat a sitemap as a list of pages you consider important and want indexed. Filling it with URLs that robots.txt blocks wastes crawl budget, generates coverage warnings in Search Console, and can leave those URLs in a confusing state — sometimes indexed without a description, sometimes dropped. It also signals a configuration error that can erode trust in your sitemap overall. Keeping the sitemap and robots.txt consistent is basic technical hygiene.
It fetches your robots.txt, parses the Disallow and Allow rules for the relevant user-agent groups, and then tests each sitemap URL's path against those rules using standard longest-match precedence — the same matching logic search engines use. A URL is reported as blocked when the most specific matching rule is a Disallow. It also handles wildcards and end-of-path anchors so the matching reflects how crawlers actually interpret the file.
Yes. Many sites use a sitemap index that points to several child sitemaps. The detector recognizes a sitemap index, follows the child sitemap references it lists, and collects URLs from them so the check covers your full set of sitemaps rather than just the top-level index. This matters because the conflicting URLs often live in a specific child sitemap, such as one generated automatically for a section you later decided to disallow.
It reads the Sitemap: directives declared in your robots.txt, which is the standard place to announce sitemap locations, and it also tries the conventional /sitemap.xml path as a fallback. This means you get a check even if you forgot to declare the sitemap in robots.txt. If your sitemaps live at non-standard URLs and are not referenced anywhere, point the tool at the domain and add the Sitemap directive so both crawlers and this tool can find them.
Decide which signal is correct for each URL. If the page should be crawlable and indexed, remove or narrow the robots.txt Disallow rule that catches it. If the page should genuinely be blocked, remove it from the sitemap instead — and remember that blocking a page in robots.txt is not the right way to keep it out of the index; use a noindex on a crawlable page for that. The goal is one clear, consistent instruction per URL.
Almost always, but not strictly. The two files serve opposite purposes, so listing a disallowed URL in a sitemap is contradictory by design. The rare exception is transitional states during a migration, but even then it is worth cleaning up promptly. Treat every conflict the tool reports as something to investigate and resolve, because at minimum it produces coverage warnings and wasted crawl effort.