In a perfect world, search engines would know about every piece of content on the web and would be ready to deliver high-quality results to users in the blink of an eye. Unfortunately, we don’t live in that world.
Even with the vast amount of resources and power available to the likes of Google and Bing, it’s not possible for these search engines to promptly crawl, index, and evaluate every single thing on the internet. This has been a common understanding in the search community for a while, but was recently addressed on Twitter by Google’s Head Webmaster Trend Analyst John Mueller.
To strike a balance of fast results and discovering new content, search engines have a certain “budget” allocated to crawling. Corners must be cut, and sacrifices must be made. Once the search engines have run out of time on your site, they will move on to the next.
Although, if your website is struggling to get indexed, there are a few solutions. This is what we refer to as “crawl budget optimization.” You can follow these basic steps to improve your website’s crawl optimization.
Review your current index status
The first step in determining if your site is getting properly indexed is by reviewing your current results count. This is a quick way to determine how many pages are currently in the search index. A simple search command can return all pages indexed from a single site. Just go to Google and search for “site:example.com” and use your website domain. Look for the About XNumber results:
Is this number what you’d expect to see? Is it roughly the amount of content pages you’ve created? Too few or too many could be indications that your website needs crawl budget optimization.
Develop Robots.txt & XML sitemaps
Even with the vast amount of resources and power available to the likes of Google and Bing, it’s not possible for these search engines to promptly crawl, index, and evaluate every single thing on the internet.
Search engines use two pseudo roadmaps to crawl and index your site. These files live on the root-domain level and establish rules for which pages to crawl, which ones not to crawl, and a list of all your content pages. These files are your Robot.txt and XML sitemap.
The Robots.txt file is a list of rules and directives for search engines to follow when crawling your site. If you have content that requires a login, internal search pages, paginated content, or sensitive information that you do not want search engines to find, then you will want to make sure to list those URLs and paths here.
Make sure to use Google Search Console’s Robots.txt Testing Tool to ensure everything is configured correctly. The Robots.txt should not be the only mechanism used for keeping results out of Google’s index though. It’s recommended you use noindex directives for that. We use the Robots.txt to make sure search engines are only requesting to crawl pages that are the highest priority.
Secondly, the XML sitemap is essentially a checklist of URLs for search engines to crawl. You can provide useful information to search engines with XML sitemaps, such as images and alternative languages. That’s a more advanced use case, though. As long as you have an accurate (and regularly updated) XML sitemap posted and accessible, search engines should be able to use that list.
Monitor your site stats
Once you have your Robots.txt created and have a fresh XML sitemap posted, you should submit it and monitor the sitemap in Google Search Console. The Sitemaps Report is where you can tell Google about the location of your sitemap and monitor the index status of your site. Check the count of URLs in the sitemap and compare it to the number you see when using the “site:example.com” command.
DEG has SEO services to help you
If you’re looking for a digital marketing agency with the SEO knowledge and experience to fix issues like crawl optimization, site speed, and content strategy, contact us today. Our Media + Search team has expertise in helping business’ improve their organic and paid search presence.