Site Wide Duplicate Content Analyzer
This tool crawls your entire site and then analyzes all your pages for duplicate content. It shows similarity percentage among all pages on your site, so you can see what pages are similar enough to trigger a flag in major search engines and consequently they can penalize your site for duplicate content.
The higher the similarity, the more likely that you will zapped by them
Below is its screenshot and I think the tool doesn’t require any how-to as it is quite simple.
Download Site Wide Duplicate Content Analyzer (2.29 MB)
If you have any comment, questions or anything else, please let me know.
P.S – This tool is NOT intended for large websites and has a lot that can be improved.
UPDATE (8/05/2006)
This tool is NOT working at the moment as the Web-based PHP script that used to compare the % of duplicate content among pages for it has been disabled by the host since it was causing a high load on the server. It won’t work till I get a dedicated server of my own to host it and update this tool. Thanks!
UPDATE (5/26/2006)
I just updated the tool and fixed “Overflow” errors and the like. I think that it can be used for small to mid-sized websites now.
If you want to tell me your suggestions or experience problems using Duplicate Content Analyzer or have found a bug, please feel free to write to sufyaaan AT gmail DOT com.
52 Responses to 'Site Wide Duplicate Content Analyzer'
Leave a Reply
You must be logged in to post a comment.
Site Wide Duplicate Content Analyzer
This tool crawls your entire site and then analyzes all your pages for duplicate content. It shows similarity percentage among all pages on your site, so you can see what pages are similar enough to trigger a flag in major search engines and consequently they can penalize your site for duplicate content.
The higher the similarity, the more likely that you will zapped by them
Below is its screenshot and I think the tool doesn’t require any how-to as it is quite simple.
Download Site Wide Duplicate Content Analyzer (2.29 MB)
If you have any comment, questions or anything else, please let me know.
P.S – This tool is NOT intended for large websites and has a lot that can be improved.
UPDATE (8/05/2006)
This tool is NOT working at the moment as the Web-based PHP script that used to compare the % of duplicate content among pages for it has been disabled by the host since it was causing a high load on the server. It won’t work till I get a dedicated server of my own to host it and update this tool. Thanks!
UPDATE (5/26/2006)
I just updated the tool and fixed “Overflow” errors and the like. I think that it can be used for small to mid-sized websites now.
If you want to tell me your suggestions or experience problems using Duplicate Content Analyzer or have found a bug, please feel free to write to sufyaaan AT gmail DOT com.
52 Responses to 'Site Wide Duplicate Content Analyzer'
-
Can you compare one domain to another to check for dup content?
-
This is a very handy tool.
-
Weezy
That’s a cool idea. But, the problem is it will take more time comparing each and every page on both domains. It is possible but actually not feasible, IMHO.
Thanks for your input.
-
Hi
If I have no pages shown at “Similarity of pages” box it means that my pages
are absolutely unique?
Thanks. -
Hi Sufyan, looks very cool. Your last comment makes me wonder what about Copyscape which does check across the board using the Google API. Is this something you have experimented with?
Thanks again, Chris
-
[…] http://www.seojunkie.com/2006/05/24/site-wide-duplicate-content-analyzer/ […]
-
Chris
No, I haven’t yet played with Google API to do the job. I will try to utilize it. Thanks for pointing this out.
:)
-
Tried the tool but got a “Runtime error ‘35761′: Request timed out - maybe I need to redownload it and reinstall.
But, from what I saw it can really help me to create better quality content for my site.
Thanks for making this tool available. -
[…] Very useful free tool to check your site for duplicate content, which will hurt your rankings. It gives a fairly useful numeric percentage ranking by page, so that you can see potential problem areas. […]
-
Is it possible to use this tool to look at certain folders within a website? My site is too large to run the tool on, but I would love to know if I am having problems with this.
I tried the software and it crashed giving me some “-6″ error.
-
I get the following error:
Run-time error ‘6′Overflow
-
your skin is FAGGY!
fucking macs
nice app thoo
-
Bummer.
I got the same error as alan r. Is my site too big? (I was comparing 243 pages)Run-time error ‘6′
Overflow
-
Hi all,
As I said above, it is good for smaller websites at the moment. I might integrate Google Api into the tool as Chris suggested to make it faster and support large websites anytime soon.
For those that are having problem using the tool, I’d suggest that you send an email to sufyaaan AT gmail DOT com with the URL of your website that you ran it on. So, I will see what is causing the problem in the debugging mode and fix it.
:)
-
[…] Wat echter nog steeds bestraft word is het vullen van meerdere pagina’s met vrijwel identieke tekst. Deze duplicate content word door sommigen bewust aangebracht in de hoop sneller en uitgebreider geindexeerd te worden. De penalty die Google ervoor uitdeelt kan echter ook mensen straffen die per ongeluk vrijwel identieke teksten hebben. SEOJunkie heeft nu een tooltje geschreven dat je eigen website kan checken voor duplicate content. Voor de download: http://www.seojunkie.com/2006/05/24/site-wide-duplicate-content-analyzer/ […]
-
[…] In addition to that tool, Aaron Wall has also discussed ways to see duplicate content. He discussed how SEO Junkie has released a tool to look for duplicate content. However, this is not about duplicate content from OTHER sites, it’s from your own website! It is a tool which you can use to crawl websites and see how many times the same keyword or content is present compared to all the other pages. If both of them are weblogs, you can always check the time stamp of the posts. Of course they could always change them manually, that’s when you should use the cache feature of numerous search engines. Check the main page to see if the content (or post) was present at the given date. […]
-
good program if it woks, keeps crashing on me when I run the scan.
-
The tool has been updated and a couple bugs removed.
-
[…] Here’s a handy duplicate content analysis software tool that crawls your entire site and analyzes all your pages for duplicate content. It returns the similarity percentage for all pages of your site. The higher the percentage, the more likely the page is similar. […]
-
Hey,
Great tool. Quick issue. I ran a test on a .com that has external links to its equivalent .ca, .co.uk, and other international sites.
The tool looked at all the pages. Maybe you could add functionality to the tool where it stays on the target domain… and does not follow links out?
I like the tool though … good stuff.
-
Another quick suggestion…. maybe allow a person to stop the crawl and see the results at a certain point.
-
Hi, Great product - I ran this on a selection of pages and it works very well. I too experienced run time errors while attempting to run the tool across the entire site. Not a good idea considering it seems to be building a cartesian product for all pages.
Any idea what rough figure we should be aiming at for uniqueness in Google’s eyes? Do you think Google can identify common objects within a page e.g. left nav, header, text ads etc and then genuinely compare the content on offer? A well structured site will always have these common components which will of course reduce the chances of a high uniqueness figure. Cheers.
-
Lee
I don’t think there is an exact % of uniqueness that Google wants. But, as long as your titles, keywords, descriptions, page copy etc. are unique throughout your entire site, you need not worry.
And yes, Google is smart enough to better identify decorative elements of a page like nav menu, header etc.than any other search engine.
-
[…] The higher the similarity, the more … Check it out! […]
-
Hi
Great tool and very handy. Found a slight glitch within it. When installing to another drive outside of ‘C-Drive’ for example if I install to ‘F-Drive’ it won’t pick the intended path and reads back errors for path file not being found. Just thought I would let you know.Thanks
-
[…] Duplicate content checker tool […]
-
[…] The Site Wide Duplicate Content Analyzer will crawl a given site and show a similarity percentage for the pages. The tool won’t compare pages across different sites or domains, but if you read my previous post on the subject you’ll recall I think the issue with duplicate content is of greater concern for legitimate sites over the one site as when compared to another site. […]
-
Hi,
Great tool!
After one day, the program output for all comparison (for all site) is always “Unknown”?
Is it a bug? What can I do?Thanks!
-
David
It accesses http://www.seojunkie.com to perform similarity check between pages. May be you tried to run it when the site is down for a while? It happens sometimes. Otherwise, too many people might have been using the tool and it couldn’t get accurate result due to overload. Huh?
-
Hi,
the problem persist…
Is possible to have a licence or buy the program?
I like this software very much and it is very helpfull for my job.
Thanks! -
Nice idea,
keeps on returning unknown for all pages it checks.
-
nice tools
i teasted it at my personal site www.shimul.info .
i like it much !!!
-
Very good idea for a helpful tool.
With a medium project (my SEO Blog - 650 pages in the Google index - the tool showed more than 3500 pages?) it crashed for me - with a small project (23 pages) I tried it several times, but the program output (%) shows always “unknown”?
This tool would be very nice for SEO´s, if you would use the Google API and if the tool could check only the sites from the Google site-search (site:yourdomain.com) for duplicate content …
-
Same result as many others, Unknown. I’d love for it to work though.
-
[…] SEO Junkie’s Duplicate content analyzer software to help you to find duplicate content within your site. It currently doesn’t do net-wide searches, but it looks like the future possibility is there. […]
-
It doesn’t work…
Runtime error and unknown.
Would be great if it did work though…
-
Mine doesn’t work properly. I only get unknow as results…
Any body any suggestions what to do?
-
[…] What’s your opinion of this sitewide duplicate content analyzer tool?It looks quite good, though I haven’t got time to run it on one of our properties at the moment. As the tool states, it’s NOT designed for big sites, and I fear that people might get the wrong impression about duplicate content with something like this. If 90% of your pages have 80% similarity, that’s acutally perfectly alright. Many times, the only thing unique about pages is a single photo and a few lines of text - the rest of the permanent navigation and template accounts for the great majority of a page’s content. Search engines really don’t care about sites/pages like this - they know that e-commerce and many content sites look this way and ignore the duplicate portions in favor of the unique elements. […]
-
I was just thinking ‘if only there was a tool that could find duplicate content for me’ and low and behold this comes up.
A huge thank you from a new and aspiring SEO
-
Ok I’m getting the ‘Unknown’ issue like everyone else. I’d really like to see this working properly though
-
hi , this is cool
bt where can we Download the FREE tool ??? -
how to download the tool
-
Renold - The tool is currently offline as I mentioned in the update to the post above.
UPDATE (8/05/2006)
This tool is NOT working at the moment as the Web-based PHP script that used to compare the % of duplicate content among pages for it has been disabled by the host since it was causing a high load on the server. It won’t work till I get a dedicated server of my own to host it and update this tool. Thanks!
I will put it available for downloading as soon as it is done. Thanks for waiting.
-
Thanks for the reply
-
I think it’s a good software. It’s helpful to check my site validity.
-
how can i contact you? i have some questions about
your analyser tool. i can’t find a email address. -
How can i download this tool?
-
I want to download this useful tool to check my website:
SBS Bulgarian properties Ltd., but i can’t find the download link. -
hi love this tools i will waiting for that.
and i have one question would any one can help.
i have internal forums in my site like http://www.xaluan.com/modules.php?name=Forums
but it took loong way to remember and if i put shortcut on hompage it not realy work good. so i created others subdomain like
http://www.forums.xaluan.com and other is http://www.xaluan.com/forum that point to the long url one..
i just not sure that will be duble containt that hurt my PR goole ..
any advice how to solve this ..
thanks -
[…] “Duplicate content” have become a standard part of the SEO lexicon over the last year or so (2005-2006), and over that time, a handful of common causes have been identified - the most common of which is poor URL handling. There is one cause, legitimately having duplicate pages or very similar pages throughout your site, which is not caused by poor URL handling, and I would recommend SEOJunkie’s tool for determining that type of issue. To note, there is some skepticism (really, skepticism in the SEO world?) as to the actual effect of duplicate content on a sites ranking. I believe that the largest impact occurs through PR dispersion. This occurs when inbound and intra-site links point to various versions of the same page, thus causing the alternate versions to split link power, rather than one single copy enjoying all the link power. […]
-
Get rid of your duplicate content…
What does duplicate content mean, why can it be dangerous for your search engine ranking and how to avoid creating duplicate content?
There has been a very intensive discussion about duplicate content in the last few years. It has become a big problem … -
[…] The Site Wide Duplicate Content Analyzer will crawl a given site and show a similarity percentage for the pages. The tool won’t compare pages across different sites or domains, but if you read my previous post on the subject you’ll recall I think the issue with duplicate content is of greater concern for legitimate sites over the one site as when compared to another site. […]
Leave a Reply
You must be logged in to post a comment.
on May 25th, 2006 at 8:31 am
Can you compare one domain to another to check for dup content?
on May 25th, 2006 at 8:59 am
This is a very handy tool.
on May 25th, 2006 at 9:51 am
Weezy
That’s a cool idea. But, the problem is it will take more time comparing each and every page on both domains. It is possible but actually not feasible, IMHO.
Thanks for your input.
on May 25th, 2006 at 10:15 am
Hi
If I have no pages shown at “Similarity of pages” box it means that my pages
are absolutely unique?
Thanks.
on May 25th, 2006 at 10:18 am
Hi Sufyan, looks very cool. Your last comment makes me wonder what about Copyscape which does check across the board using the Google API. Is this something you have experimented with?
Thanks again, Chris
on May 25th, 2006 at 10:22 am
[…] http://www.seojunkie.com/2006/05/24/site-wide-duplicate-content-analyzer/ […]
on May 25th, 2006 at 10:28 am
Chris
No, I haven’t yet played with Google API to do the job. I will try to utilize it. Thanks for pointing this out.
:)
on May 25th, 2006 at 11:29 am
Tried the tool but got a “Runtime error ‘35761′: Request timed out - maybe I need to redownload it and reinstall.
But, from what I saw it can really help me to create better quality content for my site.
Thanks for making this tool available.
on May 25th, 2006 at 4:34 pm
[…] Very useful free tool to check your site for duplicate content, which will hurt your rankings. It gives a fairly useful numeric percentage ranking by page, so that you can see potential problem areas. […]
on May 25th, 2006 at 4:54 pm
Is it possible to use this tool to look at certain folders within a website? My site is too large to run the tool on, but I would love to know if I am having problems with this.
I tried the software and it crashed giving me some “-6″ error.
on May 25th, 2006 at 6:02 pm
I get the following error:
Run-time error ‘6′
Overflow
on May 25th, 2006 at 7:41 pm
your skin is FAGGY!
fucking macs
nice app thoo
on May 26th, 2006 at 12:23 am
Bummer.
I got the same error as alan r. Is my site too big? (I was comparing 243 pages)
Run-time error ‘6′
Overflow
on May 26th, 2006 at 1:31 am
Hi all,
As I said above, it is good for smaller websites at the moment. I might integrate Google Api into the tool as Chris suggested to make it faster and support large websites anytime soon.
For those that are having problem using the tool, I’d suggest that you send an email to sufyaaan AT gmail DOT com with the URL of your website that you ran it on. So, I will see what is causing the problem in the debugging mode and fix it.
:)
on May 26th, 2006 at 6:07 am
[…] Wat echter nog steeds bestraft word is het vullen van meerdere pagina’s met vrijwel identieke tekst. Deze duplicate content word door sommigen bewust aangebracht in de hoop sneller en uitgebreider geindexeerd te worden. De penalty die Google ervoor uitdeelt kan echter ook mensen straffen die per ongeluk vrijwel identieke teksten hebben. SEOJunkie heeft nu een tooltje geschreven dat je eigen website kan checken voor duplicate content. Voor de download: http://www.seojunkie.com/2006/05/24/site-wide-duplicate-content-analyzer/ […]
on May 26th, 2006 at 5:36 pm
[…] In addition to that tool, Aaron Wall has also discussed ways to see duplicate content. He discussed how SEO Junkie has released a tool to look for duplicate content. However, this is not about duplicate content from OTHER sites, it’s from your own website! It is a tool which you can use to crawl websites and see how many times the same keyword or content is present compared to all the other pages. If both of them are weblogs, you can always check the time stamp of the posts. Of course they could always change them manually, that’s when you should use the cache feature of numerous search engines. Check the main page to see if the content (or post) was present at the given date. […]
on May 26th, 2006 at 10:58 pm
good program if it woks, keeps crashing on me when I run the scan.
on May 27th, 2006 at 2:43 am
The tool has been updated and a couple bugs removed.
on May 28th, 2006 at 10:33 pm
[…] Here’s a handy duplicate content analysis software tool that crawls your entire site and analyzes all your pages for duplicate content. It returns the similarity percentage for all pages of your site. The higher the percentage, the more likely the page is similar. […]
on May 30th, 2006 at 7:08 am
Hey,
Great tool. Quick issue. I ran a test on a .com that has external links to its equivalent .ca, .co.uk, and other international sites.
The tool looked at all the pages. Maybe you could add functionality to the tool where it stays on the target domain… and does not follow links out?
I like the tool though … good stuff.
on May 30th, 2006 at 7:11 am
Another quick suggestion…. maybe allow a person to stop the crawl and see the results at a certain point.
on May 31st, 2006 at 5:13 pm
Hi, Great product - I ran this on a selection of pages and it works very well. I too experienced run time errors while attempting to run the tool across the entire site. Not a good idea considering it seems to be building a cartesian product for all pages.
Any idea what rough figure we should be aiming at for uniqueness in Google’s eyes? Do you think Google can identify common objects within a page e.g. left nav, header, text ads etc and then genuinely compare the content on offer? A well structured site will always have these common components which will of course reduce the chances of a high uniqueness figure. Cheers.
on May 31st, 2006 at 10:31 pm
Lee
I don’t think there is an exact % of uniqueness that Google wants. But, as long as your titles, keywords, descriptions, page copy etc. are unique throughout your entire site, you need not worry.
And yes, Google is smart enough to better identify decorative elements of a page like nav menu, header etc.than any other search engine.
on June 4th, 2006 at 12:01 am
[…] The higher the similarity, the more … Check it out! […]
on June 8th, 2006 at 6:05 am
Hi
Great tool and very handy. Found a slight glitch within it. When installing to another drive outside of ‘C-Drive’ for example if I install to ‘F-Drive’ it won’t pick the intended path and reads back errors for path file not being found. Just thought I would let you know.
Thanks
on June 12th, 2006 at 11:48 am
[…] Duplicate content checker tool […]
on June 15th, 2006 at 11:58 pm
[…] The Site Wide Duplicate Content Analyzer will crawl a given site and show a similarity percentage for the pages. The tool won’t compare pages across different sites or domains, but if you read my previous post on the subject you’ll recall I think the issue with duplicate content is of greater concern for legitimate sites over the one site as when compared to another site. […]
on June 16th, 2006 at 2:16 am
Hi,
Great tool!
After one day, the program output for all comparison (for all site) is always “Unknown”?
Is it a bug? What can I do?
Thanks!
on June 16th, 2006 at 3:15 am
David
It accesses http://www.seojunkie.com to perform similarity check between pages. May be you tried to run it when the site is down for a while? It happens sometimes. Otherwise, too many people might have been using the tool and it couldn’t get accurate result due to overload. Huh?
on June 20th, 2006 at 3:17 am
Hi,
the problem persist…
Is possible to have a licence or buy the program?
I like this software very much and it is very helpfull for my job.
Thanks!
on June 20th, 2006 at 9:52 pm
Nice idea,
keeps on returning unknown for all pages it checks.
on June 23rd, 2006 at 1:23 am
nice tools
i teasted it at my personal site www.shimul.info .
i like it much !!!
Shimiul
on July 8th, 2006 at 3:43 pm
Very good idea for a helpful tool.
With a medium project (my SEO Blog - 650 pages in the Google index - the tool showed more than 3500 pages?) it crashed for me - with a small project (23 pages) I tried it several times, but the program output (%) shows always “unknown”?
This tool would be very nice for SEO´s, if you would use the Google API and if the tool could check only the sites from the Google site-search (site:yourdomain.com) for duplicate content …
on July 18th, 2006 at 3:09 pm
Same result as many others, Unknown. I’d love for it to work though.
on July 24th, 2006 at 7:44 pm
[…] SEO Junkie’s Duplicate content analyzer software to help you to find duplicate content within your site. It currently doesn’t do net-wide searches, but it looks like the future possibility is there. […]
on July 25th, 2006 at 8:33 am
It doesn’t work…
Runtime error and unknown.
Would be great if it did work though…
on July 27th, 2006 at 3:48 pm
Mine doesn’t work properly. I only get unknow as results…
Any body any suggestions what to do?
on August 3rd, 2006 at 6:25 pm
[…] What’s your opinion of this sitewide duplicate content analyzer tool?It looks quite good, though I haven’t got time to run it on one of our properties at the moment. As the tool states, it’s NOT designed for big sites, and I fear that people might get the wrong impression about duplicate content with something like this. If 90% of your pages have 80% similarity, that’s acutally perfectly alright. Many times, the only thing unique about pages is a single photo and a few lines of text - the rest of the permanent navigation and template accounts for the great majority of a page’s content. Search engines really don’t care about sites/pages like this - they know that e-commerce and many content sites look this way and ignore the duplicate portions in favor of the unique elements. […]
on August 4th, 2006 at 7:43 am
I was just thinking ‘if only there was a tool that could find duplicate content for me’ and low and behold this comes up.
A huge thank you from a new and aspiring SEO
on August 4th, 2006 at 1:31 pm
Ok I’m getting the ‘Unknown’ issue like everyone else. I’d really like to see this working properly though
on August 15th, 2006 at 2:13 pm
hi , this is cool
bt where can we Download the FREE tool ???
visit my blog
on August 29th, 2006 at 8:55 am
how to download the tool
on August 30th, 2006 at 2:07 am
Renold - The tool is currently offline as I mentioned in the update to the post above.
I will put it available for downloading as soon as it is done. Thanks for waiting.
on August 31st, 2006 at 4:27 am
Thanks for the reply
on September 7th, 2006 at 3:19 am
I think it’s a good software. It’s helpful to check my site validity.
on September 11th, 2006 at 5:18 am
how can i contact you? i have some questions about
your analyser tool. i can’t find a email address.
on September 24th, 2006 at 10:49 am
How can i download this tool?
on September 28th, 2006 at 10:01 pm
I want to download this useful tool to check my website:
SBS Bulgarian properties Ltd., but i can’t find the download link.
on November 1st, 2006 at 10:37 am
hi love this tools i will waiting for that.
and i have one question would any one can help.
i have internal forums in my site like http://www.xaluan.com/modules.php?name=Forums
but it took loong way to remember and if i put shortcut on hompage it not realy work good. so i created others subdomain like
http://www.forums.xaluan.com and other is http://www.xaluan.com/forum that point to the long url one..
i just not sure that will be duble containt that hurt my PR goole ..
any advice how to solve this ..
thanks
on November 7th, 2006 at 1:12 pm
[…] “Duplicate content” have become a standard part of the SEO lexicon over the last year or so (2005-2006), and over that time, a handful of common causes have been identified - the most common of which is poor URL handling. There is one cause, legitimately having duplicate pages or very similar pages throughout your site, which is not caused by poor URL handling, and I would recommend SEOJunkie’s tool for determining that type of issue. To note, there is some skepticism (really, skepticism in the SEO world?) as to the actual effect of duplicate content on a sites ranking. I believe that the largest impact occurs through PR dispersion. This occurs when inbound and intra-site links point to various versions of the same page, thus causing the alternate versions to split link power, rather than one single copy enjoying all the link power. […]
on September 4th, 2007 at 10:04 am
Get rid of your duplicate content…
What does duplicate content mean, why can it be dangerous for your search engine ranking and how to avoid creating duplicate content?
There has been a very intensive discussion about duplicate content in the last few years. It has become a big problem …
on January 26th, 2008 at 6:23 pm
[…] The Site Wide Duplicate Content Analyzer will crawl a given site and show a similarity percentage for the pages. The tool won’t compare pages across different sites or domains, but if you read my previous post on the subject you’ll recall I think the issue with duplicate content is of greater concern for legitimate sites over the one site as when compared to another site. […]