Duplicate Content & Canonical URLs

What It Is

Duplicate content is a page or substantial section of content that appears in exactly the same way, or an incredibly similar way, on multiple URLs of your website. For example, on an ecommerce site, these three URLs could list the same products, albeit in a slightly different order. By doing so, these three pages would create duplicate content.

http://www.domain.com/product-list.html
http://www.domain.com/product-list.html?sort=color
http://www.domain.com/product-list.html?sort=price

The second and third example URLs contain a sort parameter ("?sort=color" or "?sort=price") . That creates a slight difference between these pages (in the way the products listed are sorted). But these pages would still have the same products, the same images, the same text, and, likely, the same title and description tags.

With that much similarity, these three URLs would be considered duplicate versions of the same page. That duplication may confuse Google as their robots try to decide which pages to show in search results. In many cases, the page Google may choose to show in the search results may not be the same page you would prefer people find. In this example, you may prefer people find the first URL instead of the sorted versions of the page. In some cases, Google may also penalize your website for duplicated content.

Resolving Duplicate Content

There are many ways to resolve duplicate content, including adding redirects to consolidate the three versions into one, removing the duplicated pages altogether (see our article about avoiding errors when removing pages), or rewriting the duplicated page to make it more distinct.

Those methods assume you do not want to keep the duplicated version of the page. In the example above, however, you would likely want to keep the three pages on your website. Providing a way to see the same products sorted in slightly different ways may actually be helpful to the people visiting your website.

In this scenario then, where you want to keep the duplicated page, the best method is to define a canonical version of the URL. A canonical URL is the official or preferred version of a URL.

Defining A Canonical URL

When you have duplicate versions of a page, as in the example of the above, a canonical tag (more officially called a canonical link element) communicates to search engines which URL you prefer using.

In the above example, you might consider the first URL to be the official or preferred version because it does not have a sort parameter which makes the URL look nicer. However, if sorting by color is the most popular choice for your users, you might prefer the canonical URL be the second URL instead. Alternatively, you may find that the third URL (sorted by price) gets the most attention from other websites or on social networks and therefore the third version might make more sense as the canonical URL.

Canonical URL Example

After you select the canonical version of the URL, the canonical tag needs to be added to each potentially duplicated page. In the example above, any duplicated URLs would contain a canonical tag referencing the canonical URL you selected.

The canonical URL can be defined two ways. The most common is to use a <link/> element in the <head/> of your web page. Here is an example of the canonical code with the URL in the href attribute.

<link rel="canonical" href="http://www.domain.com/canonical-url" />

Another alternative is to add a Link to your HTTP Headers. This is useful for non-HTML files (but typically required technical support to add to your website).

Link: <http://www.domain.com/canonical-url>; rel="canonical"

Supporting The Canonical Elsewhere

You should not rely on the canonical tag as the only means of communicating your URL preferences to search engines. Links throughout the rest of your website should link to the canonical (or official) version of the page as well. This avoids sending mixed signals to the search engines.

For example, in the above example, if you define http://www.domain.com/product-list.html?sort=color as the canonical URL, but the majority of the links on your website reference http://www.domain.com/product-list.html, this would send conflicting signals about which version of the URL is really the definitive, authoritative, and canonical version of the URL. As possible, you should ensure that the links to duplicated pages within your website use the canonical version of the URL.

Resources