Orphan pages are web pages that are no longer navigable for users to discover. These pages don’t have direct paths for search bots to find them, but they live on the website. There are countless reasons why this happens, but I want to display how to find orphan pages and how to fix them for crawl budget optimization in SEO. The tools that I use for this analysis are Screaming Frog and Screaming Frog Log File Analysis. Combining a recent crawl against a log file will show the orphan pages that ultimately hurt your SEO efforts. If you were thinking about how to find orphan pages on a website, you’ve come to the right place.
If you are looking for help with these steps below, I’m a Boston SEO who can help with this.
Import a Crawl
Downloading a recent crawl from Screaming Frog is the first step in this process. The next step is to import the URL data by clicking on the import tab in the tool. Drag and drop the new crawl into the log file section, so you have the logs and crawl data together. The next step is to use the URLs tab to uncover the issues if they exist.
Not in URL Data
Only using a regular crawl with Screaming Frog will fall short in this type of analysis. When you combine the “Not in URL Data” with a log file, you will see if orphan pages exist. Orphan pages mean that they live in your domain, but Google and Bing were unable to get to them since there is no clear path.
Why this Matters
Crawl budget exists with Google and Bing. If you want to learn more about what this is, you can click on this link that talks about crawl budget. Both search engines can’t crawl a website for an infinite amount of time, so you can’t waste their crawl budget. Cleaning up orphan pages is a fantastic step in creating a clear crawl the next time they visit your website.
How to Address This Issue
The steps that you need to take vary on how many web pages have the issue. In this example below, I have 60k pages that have this problem. On my website, I only have a handful orphan page, so my strategy will be different.
The first step is to check if the webpage should be index-able or non index-able. If the page/s serves no value for your audience, you should NOINDEX them. A NOINDEX tag will take that page/s out of the SERP in Google and Bing, so you don’t have to worry about people discovering them from a search query. If this webpage still has value for your audience, you should keep it set to index. A webpage is index-able by default, so you will not have to do anything to keep it index-able.
The second step is to look at the theme of what the webpage is about. Let’s say that many of these orphan pages are past sales that serve no purpose today. We can 301 or 302 redirect these pages to the proper section on the website. A 301 redirect acts as a permanent move so that the page authority will pass to the new location. A 301 redirect is an excellent strategy because it will increase the power to the new place and drive more traffic to that page. A 302 redirect means that this is a temporary change, so you will need to address this in the future.
The third step is to look for the URL structure if one exists with the orphan pages. If I take the sale example from above, I will look for the same path. Let’s say that the sale pages have /promo/ in their URL path. I can block this way by using the robots.txt file, and a disallow command to discourage Google and Bing from looking there. The disallow command can be the saving grace if your URL’s follow a path because it can’t solve the problem with one line of code.
The fourth step requires connecting the webpage to your website. I put this option last because proper site architecture should be the priority in SEO. A quick connection acts more like a quick fix to a big issue. A relationship can be a link from the homepage that connects the orphan pages to the website. I urge you to check your site architecture for your website to make sure you have a clear path to the future.
Orphan pages can exist on a domain for many reasons. One common reason why these pages exist is the URL structure that they live under. As a test, you can take apart a URL to see if that next level up exists. If it does not, you can then assume that everything behind that is an orphan page. These pages take away crawl budget from Google and Bing and thus take away their attention from crawling critical pages on your site. By combining two tools from Screaming Frog, you can discover where these pages exist and figure out what to do with them.
Finding orphan pages on a website and fixing them can do wonders for your SEO, but it may be hard to justify the optimization immediately. Pages created a long time ago with links removed are also a common reason why orphan pages exist. The status code on these pages will most likely be a status 200 because they are accessible for Google and Bing to get to.
A clean path for Google and Bing won’t’ directly increase the rank for a given keyword, but will have an overall positive effect on your website. You can measure this by seeing if a critical section on the site has seen an uplift in crawling activity. An increase in crawl activity can increase rankings and traffic, so it becomes a cause and effect situation.