At some point on every website, the humans and robots visiting your website will attempt to access a page that doesn't exist. This can happen for a variety of reasons, including a user misspelling a word in the URL or somebody adding a wrong link back to your website. You may have also recently adjusted URLs on your website and something went wrong with your redirects (link to redirect page). Regardless of the reason why somebody reached that "not found" error page, your website must handle the error correctly, especially for robots accessing the page as well as for people who reach the "not found" error.
Correct handling begins by sending the appropriate HTTP response status code from the server to indicate to a robot that the URL requested is in error. A person never sees the response code but search engines and other automated services use this as a way to better understand the page and how to work with this page.
The proper HTTP response status code is either a 404 (not found) or a 410 (gone). Using a 404 header response code is appropriate in almost all cases as it indicates quite clearly the page requested is unavailable. However, if you specifically remove a page from your website, it is more appropriate to have that URL of the removed page return a status 410. Unlike a 404, a 410 indicates you specifically and intentionally removed this page. This helps to clarify why the page is no longer available.
That said, there is little real difference between the ways a 404 pages and 410 pages are treated by people visiting your website or Google. Decide what practice makes the most sense for your website and maintain that structure. For example, if you return a 410 response for blog posts you’ve removed, but a 404 response for all other content, continue that practice.
One common problem encountered with website error messages is sending a non-error status response code on an error page. The default HTTP status response code for non-error pages is a 200, which means this page is "Okay". If the default response code of 200 is used on an error page, instead of a 404 or 410 the error page is interpreted as a sign that there is nothing wrong with that particular page. This creates a "soft 404"—a page that looks like an error page, but doesn’t provide the appropriate response code to define this page as an error.
While most humans won't be able to tell the difference between a soft 404 and a 404 error page, this can present a problem for automated programs accessing your website, like robots from search engines. Sending the incorrect error means robots don't know to treat the error page as an error page. This can lead Google to indexing error pages (which you'd rather they not do) or wasting time crawling pages of your website you'd rather Google not look at.
Along with properly displaying the error to search engine robots, you also want to make sure that people who encounter the error clearly understand an error occurred. As well, you want to make sure people stay on your website instead of leaving your website since the page they were seeking was not found. This means you want to clearly state that an error occurred and give visitors a way to continue beyond the error page (preferably to another related page on your website).
For more about handling errors for humans, please review Elementive's guide to creating delightful error experiences.
While there are many ways of checking for 404 errors, one method is through Google Search Console. In Google Search Console, you can see the 404 errors Google has encountered while crawling the web. Under Crawl, click to Crawl Errors.
For more about 404 errors and the tools you can use to check 404 errors, please watch the following video.
Want help improving your website’s technical SEO factors? Contact us today to discuss how we can help review and improve your current technical structure.