How to Clean a Dirty XML Sitemap in WordPress - TM Blast

How to Clean a Dirty XML Sitemap Using WordPress

A clean sitemap is a key part for having an effective SEO strategy for your website. While Bing has said they have a low tolerance on dirty sitemaps, Google has said they are more lenient. I believe it is still important to have a clear sitemap non the less. A clear sitemap will also cut the amount of unnecessary steps the search bots have to take to crawl your sitemap. Assuming you know how to crawl your sitemap for errors, you are now ready for the next step if you use WordPress.


Screaming Frog Results + WordPress


Here is an example of a screaming frog pull that I did on my site. I sorted the view to show the three 301 redirects in my sitemap. When I look at the URL’s, I can see what blog posts need my attention to be fixed.


301 Redirects Screaming Frog


Assuming you are using Yoast, you can head over to the SEO area and look for the xml sitemap section. From there, you want to look for the exclude post branch so we can specify what posts we want to exclude from the sitemap. Here is a picture below on what you should be seeing. The posts exclude section is where you are going to list what posts you want to exclude.


How to Exclude Posts from your Sitemap


How Do I Find the Post ID in WordPress?


Now that we know where to add the exclusion, we need to figure out how to find the ID’s of the post that we want to remove. To do this, you simply need to head to the post section of your site and open up that blog post that you want to remove from your site. If you click on edit, you will see the URL at the top of the screen post the ID. Here is an example on the ID number that is associated to this blog post. If you have a lot of posts to remove, I would recommend putting this number in excel so you can copy and paste this at the end.


How to Find the Post ID in WordPress


Now that we know what and where the Post ID is, we have so head back to the Yoast XML section and paste that number in. Once that is complete, we should then rerun the results in Screaming Frog to see if in fact we fixed the issues that were presented in the earlier report. As a note, you should re download a new sitemap and crawl it with Screaming Frog to see the new results. One of the neat things about this exercise is that you don’t need to have the paid version to do this exercise in Screaming Frog. You need to use the list and upload section and you will actually have an unlimited amount of URL’s crawled from this approach. You can then quickly see if you in fact cleaned up the sitemap from this exercise.


Why Do Some URL’s Never Get Indexed in Google and Bing?


There are many reasons as to why some of your key pages don’t receiving any organic traffic. Assuming you know the basics like title tags and stuff, you might want to take a deeper dive with tools like Google Search Console and Bing Webmaster Tools. Take notice of the Search Console report from Google specifically to see what the ratio of submitted vs indexed pages are. You want that number to be as close to 1:1 as possible. Anything that looks far off usually needs a deeper dive to figure out why the pages are not being indexed. Here is an example of what Google Search Console says about my sitemap. As you can see here, I am very close to the 1:1 ratio, so I know what I am trying to present is in fact being indexed. If you are ever curious why some pages are not being crawled properly, you can fetch as Google and Bing.


Clean Sitemap


Another reason why certain pages might not be indexed can be due to an improper sitemap file. You should be using an xml sitemap file when you present that to the engines. You might have some restrictions on your site with the robots.txt file that specifically is block certain pages from being seen by their bots. You could also have some NOINDEX tags on your site that can be blocking these pages from being indexed. Either way, you now have a few things to look at when you are reviewing your site.


1686 Massachusetts Avenue Apartment C, Cambridge Massachusetts 02138