Glossary entry: Duplicate Content

Duplicate Content - What is it?

Duplicate content (DC) is content that is exactly the same or very similar on your own website or on different domains. (near duplicate content) occur. Not only 1:1 copied content is considered duplicate content, but also single text blocks are sufficient to no longer be recognized as "unique content".

That is equal contents are with different URLs accessible and were indexed with different URLs at Google. If different websites are indexed with duplicate content, this can have a negative impact on the ranking in the SERPs affect and result in a demotion and demotion - up to deindexation.

What are the types of duplicate content?

Internal duplicate content

Internal duplicate content is located within your own domain. Often, online stores and editor CMSs have a problem with duplicate content, as product detail pages can be called up directly and are present once again on associated category and/or product pages. (e.g.: https://www.shopdomain.at/produktdetails and https://www.shopdomain.at/kategorie/produktseite/produktdetails)

Some examples of internal duplicate content:

Pages with identical content can be accessed via different domains, subdomains or URLs
Printer optimized version for one or more web pages
Identical page title and meta description (SERP preview)
Mobile version of a website with identical content to the desktop version
Duplicate text in header, footer or sidebar

External Duplicate Content

External duplicate content occurs when duplicate or very similar content appears on several domains (e.g., your own website can be found under several URLs). Duplicate content is also created indirectly by copying or copying from other websites or by content theft.

For the Google Bot it is difficult to find out which page of a domain is more suitable for the search query and in case of identical content it decides itself which search result is more relevant for the SERPs. This leads to fluctuations in the ranking and influences the visibility of your own website.

With targeted SEO measures (search engine optimization) you can counter this problem and offer the Google bot a "unique content" again.

More examples of external duplicate content:

Manufacturer item description
Content feed via RSS feed
Content scraping (usually a bot that downloads and copies the entire content)
Press release dissemination
Content about affiliate sites

Near Duplicate Content

Near duplicate content is similar duplicate content on a website or within a domain.

There are two possibilities at which point Google classifies the content as near duplicate content:

Copied content, only slightly modified
1:1 copy, but with recurring text fragments in sidebar, footer or header

How is duplicate content created?

The emergence of identical content can come from different sources or have different reasons. After that, we have some examples of how duplicate content can appear on your site or domain:

Content accessible with or without subdomain (www.)
Pages with http and https
Start page with and without "index.html" available
Identer content is linked with different URL parameters. (e.g. merchandise from online store filtered by different criteria, but same results are displayed).
Pagination (page numbering)
different language versions of a website
Use exactly the same content and text from external pages
Category and tag pages (e.g. on blogs)
Domain change (use of the same content)
Session IDs in the URL (tracking user behavior)

But not everything is duplicate contentwhat looks like it. Some duplicate content does not cause any problems for Google algorithm like:

Translations - correct marking by hreflang
Quotations - correct marking in the source text
Content in apps

How to avoid duplicate content?

First of all, try to avoid duplicate content in the first place. Write your own SEO texts and do not copy indiscriminately from other sites. Always look to create unique and quality content (Unique Content) for your users that offer them added value.

Not only your website visitors will be grateful, but also the Google bot will be happy and you will achieve a better ranking in the search engine.

Technical SEO measures to avoid internal duplicate content

a clean URL structure your side - Content is always available for one URL only (no unnecessary URL variations)
Canonical Tag set - point to original URL
301 Forwarding set up - Redirection of URL for duplicate content to a single URL (e.g. when relaunching the site)
for translations hreflang tag use - Language and country identification of the page
Settings of the GOOGLE SEARCH CONSOLE (GSC) use
noindex tag - Use only for non-relevant content
robots.txt - use sparingly to exclude websites from crawling
Use tools - detect external duplicate content (Copyscape, Webconfs, Sitelin)
Regular control of your own website for duplicate content
Watch out for content theft from your domain - Ask webmaster of the site to link or remove same content with original source. If there is no other option, then report the copyright infringement to Google via Form via Google Search Console)

Finally, it should be said that for SEO (search engine optimization) avoiding duplicate content is a very important point, because search engines also consume resources (e.g. hardware, energy, time) through crawling and indexing.

An SEO agency, like us at Comma99, can provide a remedy and clean up or remove unwanted duplicate content and replace it with unique, high-quality content. SEO technicians then take care of the implementation and technical realization of the necessary measures.