Robot Crawlability

What It Is

The most critical technical web marketing task is making the website accessible to search engines spiders, or robots, that need to crawl through the various pages, images, and other media on your website. To support robot crawlability, the majority of your content should be placed in text structure within HTML tags.

That sounds simple. Occasionally, though, some content will need to be dynamically served via a JavaScript function, placed in Flash, or placed within an image.

If you need to place content within dynamic elements, like JavaScript, you can make sure spiders can access the JavaScript code that renders that content. This typically allows spiders to run the JavaScript code the same way your human visitors can, which lets them see the content resulting from that script.

Another way content can be blocked from search engine spiders is if it is placed within Flash. While the use of Flash is trending downward, a large number of websites still employ Flash technology. The use of Flash risks having your content blocked from search spiders.

A more ideal solution is to move away from Flash, which embeds content outside of the robots view, to a solution that allows robots like search engine spiders to view your content. Using jQuery or HTML5 elements can give you control over animations and effects similar to Flash, while still allowing the text to be accessible to search engine spiders.

A final way content can be blocked from search engine spiders is by placing text within an image. Often, designers will  place text that needs to be in a special font in an image. The advantage is that the image ensures the text looks the same way to all visitors. Unfortunately, search engine spiders cannot (easily) read the text contained in an image. As an alternative, you can use a font from Google's font library. Using a font from the library, will allow the words to be placed in text where they can be accessed by Google while ensuring that all visitors see the content in the appropriate font.

Providing Alternative Text

If the content absolutely must be placed in Flash, in an image, or in a JavaScript function where a search engine spider will be unable to view the content, then alternative text, or alt text, should be provided. For example, in an image, alt text can be used to tell Google what words the image contains. For content contained in JavaScript, you can place the content in a <noscript> tag so that Google could still find the right text. For Flash, you can provide an alternative means of accessing the content. For instance, provide the content in a transcript or plain text format that robots (and users who do not have Flash enabled) can more easily access.

Hidden Content

A final consideration when thinking about crawlability is hidden content. For example, it is common practice on websites to place some content behind tabs. However, this type of tab technique can hide your content from Google, which can get you penalized. Google's general rule of thumb is that if the content is hidden but still available to users (for instance, by clicking on a tab), then that is acceptable behavior. If it is hidden and there is no obvious means of a user being able to un-hide the content, then you run the risk of robots not finding the content (as well as users being unable to see the content too).

Checking Crawlability

Google Search Console

In Google Search Console, you can use the "Fetch As Google" tool to see your website as Google's spiders see this page. In Google Search Console, click on crawl then click on Fetch As Google.

Google Search Console - Fetch As Google

Input your page's URL then click Fetch. After inputting your page's URL, you can make sure Google can find the content you are expecting Google to find.

Google Search Console - Fetch As Google Search Box

As well, Google also offers a report on blocked resources. This blocked resources page shows you what content you are preventing Google from accessing. For example, this will show you if you are blocking a JavaScript file that is responsible for rendering some of your website's content.

Google Search Console Blocked Resources

Bing Webmaster Tools

Bing also offers an ability to fetch a page on your website as Bingbot. After logging in, go to Diagnostics & Tools, then click on Fetch as Bingbot.

Bing Webmaster Tools - Fetch As Bingbot

After inputting your URL in the search box, you will see a copy of the content as Bing sees it. You can search through this code to make sure that the most important content you need Bing to find is contained in this output.

Bing Webmaster Tools - Fetch As Bingbot

Resources