Screaming Frog offers the ability to upload your sitemap file and check for 200, 301, 302, 404, and 500 status codes that may be present in your file. This blog post will go over how to upload your sitemap as a list and crawl to find 404 errors easily. This blog post will go over the importance of having a clean sitemap for SEO success too. Let’s get started.
Save Your Sitemap as a File
The first step is to find your sitemap and save it as a file. If you are unsure where your sitemap is, you can check out Search Console or Bing Webmaster Tools and look for the sitemap section (assuming you have submitted this to the tools). You can also try “xml.sitemap” at the end of the URL to see if anything comes up. If you can’t find your sitemap, you can use Screaming Frog to crawl your entire site and look for the sitemap URL. If you do not have a sitemap, you can check out this guide from Screaming Frog on how to create a free sitemap using their service.
Here is my sitemap below. To save this file, you need to right-click on this page and simply go to save as. It’s really that simple. As a note, you will have a sitemap if you use WordPress and you use Yoast.
How to Find 404 Errors in Your Sitemap
With the file now saved, we want to head over to Screaming Frog to upload this sitemap to check. When you get to the tool, you want to click on Mode –> Upload List –> From a file–> Choose your document. With this uploaded, you can start the crawl of your sitemap. Here is an example of my sitemap crawled with the tool. Immediately, I can see that I have three posts in my sitemap that are actually 301 files. The goal of the sitemap is to bring back 200 pages, so already I have some work that I need to look into. If you have a large site, you will need to click on the export button. From there, you can create a pivot table and break out the URL’s by all the errors that are on your site. You might also want to see the total number of 404 errors on your site in general after pulling this report. Here is a helpful post on how to find 404 errors in screaming frog.
Why are Clean Sitemaps Important for SEO?
Having a clean sitemap is a great way to optimize your site for search engines. If you have a robots.txt file, you will (should) have your sitemap in there. A robots.txt file is the first place search bots look at when they come to your site and will follow the link to your sitemap if you give that to them. Your sitemap should contain all the pages on your site to help prioritize what pages should be crawled by the search bots. Some other reasons why you want to have a clean sitemap is that you might have a poor internal link strategy on your site, you might be a new site that needs all the help in getting indexed by the bots, you have a large site, you have poor architecture navigation, and more. Having these pages in a list essentially becomes an extra guide for the bots to discover and crawl your pages. One thing to remember is that your competitors can see your sitemap, so you should just be aware of that when you have a sitemap for your website.
Another benefit from using sitemaps is that you can see how many pages are properly submitted and indexed by the two engines. You want the ratio of submitted vs indexed to be as close to 1:1 as possible. If you see any big gaps, chances are you have a messy sitemap. Google will let you know of a few errors, but you really need to use Screaming Frog once you start dealing with thousands or more URL’s of a website. If you see a the blue chart much higher than the red chart, you are submitting pages to be indexed, but they are not. This is usually one of the main reasons why website’s don’t rank as well as they would like. If you have concerns about the sitemap for your site, you can fetch this as Googlebot to see what Google see’s and if they are having a hard time getting to that URL.
If you are looking to generate a sitemap for your own site, I would recommend checking out the XML sitemap generator. This tool will help you create a sitemap for your site that is compliant with the major search engines.