Duplicate content is a major SEO concern. It’s up there with dodgy links and avoiding Google penalties. Having duplicate content can be damaging to any site’s organic traffic. Everyone involved with SEO understands this. That doesn’t mean that duplicate content is easy to avoid. In spite of your best efforts, your site might still suffer from issues with duplicate content.
This guide is designed to help you resolve those issues. We’re going to point you to the main ways that duplicate content can occur. We’ll then get into the nitty gritty of what you can do to avoid and resolve duplicate content issues. First, though, it’s worth explaining what duplicate content is and why it matters.
The best way to explain what duplicate content is, is by looking at how Google themselves define it. In their support guidelines regarding duplicate content, they offer the following definition:
‘Substantive blocks of content within or across domains that either completely match other content or are appreciably similar.’
That’s simple enough as is why duplicate content is important. That’s because of how it affects what Google aim to provide their users. The search engine strives to index and display pages with distinct information. That’s part of their ongoing desire to ensure better user experience.
Pages with duplicate content do not qualify as having distinct information. As such, Google will filter those duplicating pages. That means that only one of the pages featuring duplicate content will be listed. That can have a profound negative effect on a domain’s organic traffic. Pages that would otherwise drive more traffic to a site won’t be listed at all.
It’s a common misconception that Google impose penalties for duplicate content. That’s not the case, but if they suspect malicious use of duplicate content they will act. That would be when the content is used to manipulate their rankings. In that case they:
‘Make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index. In which case it will no longer appear in search results.’
By now it should be obvious that you want to avoid duplicate content on your site. Even if you ‘re careful to do so, it can still occur. There are many ways this can happen.
As we’ve already mentioned, duplicate content can be deliberately featured on a domain. Usually as a way of trying to trick or manipulate Google’s rankings. Every SEO pro now knows how clever Google’s algorithms are. Only the most foolish or uncaring of them would think they could get away with such manipulation.
It’s far more often the case that duplicate content on a site has developed naturally. That will either be due to certain technical problems or simple human error. It’s important to understand the main ways in which this can happen. It will help you to identify your own duplicate content issues. It will also make it easier to choose the best possible solution.
The causes of duplicate content that we’re going to discuss are as follows:
URL parameters are like suffixes added to the end of a page’s URL. They occur in many situations and often don’t change the content of a page a great deal or even at all. The problem is that to a search engine a URL with a different parameter at the end is a different URL. If the content linked to by the ‘two’ URLs is the same, Google will identify it as duplicate content.
A prime example of this comes from filtering products on ecommerce sites. Almost all of those types of sites let customers filter products. They may wish to show only products within a certain price range or made of a particular material. The act of filtering the products adds a URL parameter to the URL. The content shown – the products etc. – will all be duplicated elsewhere, however.
Another example is in the case of tracking. Tracking parameters let you track the sources of your site visitors. This can be crucial for monitoring ROI of different SEO efforts. They may look something like this: ‘/?source=rss’. They have no impact on the content of a page but also look to a search engine like a unique URL.
Also a problem particular to ecommerce sites is that of category page crossover. Many sites will have different category pages that display primarily the same products. This is often done for well-meaning and understandable reasons.
For instance, a gift site might have categories named ‘Gifts For Him’ and ‘Father’s Day Gifts’. The two categories may well attract different customers. The products displayed on the category pages, though, will be almost identical. That is all that will matter to Google and they might well only index one of the pages.
One level down on ecommerce sites from category pages are product pages. These can also be a common source of duplicate content issues. Visitors to such pages will expect there to be a short product description. It will be how the product’s features and characteristics are sold to customers.
Sites that sell lots of products often don’t create unique descriptions for each. Many firms simply copy and paste generic information. Often that has been provided by a supplier or manufacturer. That leads to loads of duplicate content within and across different domains.
The biggest issues in this case will be if your site sells the same products as a much bigger retailer like Amazon. Copied descriptions might lead your product page to duplicate content found on Amazon. Google will definitely index Amazon’s page rather than yours.
As well as URL parameters there are a couple of other technical URL issues that can lead to duplicate content issues. The first comes in the shape of ‘session IDs’. These are used in URLs when site visitors are given a ‘session’. That’s often so that they can add items to a shopping cart and have them stay there.
Session IDs are added to every internal link as a visitor travels your site. That creates lots of URLs which a search engine may view as duplicate content. In a similar vein, untidy URLs as part of a CMS can have a similar effect. URLs with parameters for category and article which change order are prime examples.
Your CMS might well create printer friendly pages. These pages will be linked to on your site from article pages and elsewhere. Google will be able to find these pages unless you specifically stop them (more on that later).
Google will filter and index only one of the duplicate pages. That might be the original or the printer friendly version. You want your original page to rank, not the printer friendly one. The latter won’t have all your ads, links and other content.
Most of the above are technical causes of duplicate content issues. Where human error comes in is in the area of content creation. Almost every site these days has a blog or similar informational resource. It helps them to provide useful information to visitors. Blogs can often be home to lots of duplicate content.
This may be due to trusting content creation to someone you shouldn’t. Someone who doesn’t understand the problems duplicate content can cause. They might copy or recreate content without knowing the SEO issues they’re creating. Their errors could be as small as always using the same title tags. They could be as large as directly copying content from other sites.
You should now have an idea of where your issues with duplicate content may have come from. The above are all causes of those issues which are common to many sites. Understanding them and knowing which have affected your site is crucial. That’s because the different causes lend themselves to different solutions.
We’re going to run through some of the best ways to resolve issues with duplicate content. We’ll flag up which of the issues and causes we’ve already mentioned fit best with each solution as we go. Our solutions fit within two categories:
In an ideal world you want to avoid issues with duplicate content before they arise. Knowing about the causes of the issues we’ve discussed is a great starting point. Having that knowledge can help you take steps to ensure that no new content will fall foul of the same problems.
You can, for example, disable session IDs in your system settings. That will prevent the duplicate URL issues those can cause. You could choose to forgo including printer friendly pages on your website at all. It’s not as if many people have cause to print out pages nowadays anyway. A hashtag based tracking campaign can also be a good alternative to parameter based tracking.
Having learnt about the causes of duplicate content, you’re in a position to educate others. They can include web developers or your product team. You can explain to them the issues related to crossover in product categories. That way they’ll know to arrange products accordingly. Freelance or in-house content creators can also be briefed on keeping things unique.
That is in an ideal world. In reality, you may not be able to get ahead of all your duplicate content issues. In those circumstances you need some practical solutions. They will be what can help you to recover from the issues you’re already suffering from.
Our guide so far should have shown you where your duplicate content issues may have come from. We’ve now also offered some tips as to avoiding more issues cropping up. What’s left is to suggest some courses of action for if your site has already got issues with duplicate content. There are plenty of different options open to you.
Canonical URLs can help if your issue is with different URLs leading to the same content. As in the case of filtering parameters or category pages, as described earlier. A canonical URL is the ‘correct’ URL. It is the URL of the page that you want Google to index out of those leading to the same content. You need to decide in each case which page that is.
Once identified, it’s simple to tell Google which page is your canonical URL. All you need to do is to add an HTML element to the <head> section of the other pages. Called the ‘canonical link element’, it looks like this; ‘rel=canonical’. It will point Google to your chosen page when followed by its URL.
Pointing Google to canonical URLs is sometimes described as using ‘soft redirects’. That’s as opposed to fully fledged 301 redirects. You can also use these if you can’t or don’t want to remove duplicate content.
Applying a 301 redirect to a URL will steer Google toward your chosen page. It will then be that page that the search engine indexes. This could be a useful solution to the issue of overlapping product category pages.
All you would need to do is to identify which of the categories is most valuable to you from a web traffic point of view. You can then use 301 redirects to from the other duplicate or overlapping pages to that category.
A Noindex tag is a directive that can be added to the HTML source code of a page. It explicitly tells Google that you do not wish the page to be indexed. That can prevent Google filtering out a page you do want indexed in favour of one which you don’t.
Noindex tags are the best solution for issues caused by printer friendly pages. You should apply a Noindex tag to each of those pages. That will ensure that the original version of each page will be the one which Google indexes.
Sometimes duplicate content issues just don’t have a quick fix. That’s the case if your problem is with blog posts or product descriptions. If they’ve created duplicate content you need to find the offending copy and rewrite it. This solution is time-consuming and labour-intensive. There’s just no other way to ably deal with the problem.
One way you can save yourself a little time and effort is by using a free online tool like Copyscape. Copyscape is designed to help you write content that’s not plagiarised. You can pop a URL into the site and it will search the web for duplicate content. That lets you find the exact elements of your content that you’ll need to cut, replace or rewrite.