SEO Junkie


Site Wide Duplicate Content Analyzer

Posted in Tools by Sufyan on the May 24th, 2006

This tool crawls your entire site and then analyzes all your pages for duplicate content. It shows similarity percentage among all pages on your site, so you can see what pages are similar enough to trigger a flag in major search engines and consequently they can penalize your site for duplicate content.

The higher the similarity, the more likely that you will zapped by them

Below is its screenshot and I think the tool doesn’t require any how-to as it is quite simple.

 

Download Site Wide Duplicate Content Analyzer (2.29 MB)

If you have any comment, questions or anything else, please let me know.

P.S – This tool is NOT intended for large websites and has a lot that can be improved.

UPDATE (8/05/2006)

This tool is NOT working at the moment as the Web-based PHP script that used to compare the % of duplicate content among pages for it has been disabled by the host since it was causing a high load on the server. It won’t work till I get a dedicated server of my own to host it and update this tool. Thanks!

UPDATE (5/26/2006)

I just updated the tool and fixed “Overflow” errors and the like. I think that it can be used for small to mid-sized websites now.

If you want to tell me your suggestions or experience problems using Duplicate Content Analyzer or have found a bug, please feel free to write to sufyaaan AT gmail DOT com.

52 Responses to 'Site Wide Duplicate Content Analyzer'

Subscribe to comments with RSS or TrackBack to 'Site Wide Duplicate Content Analyzer'.

  1. J-Weezy said,

    on May 25th, 2006 at 8:31 am

    Can you compare one domain to another to check for dup content?

  2. MrWorthing said,

    on May 25th, 2006 at 8:59 am

    This is a very handy tool.

  3. Sufyan said,

    on May 25th, 2006 at 9:51 am

    Weezy

    That’s a cool idea. But, the problem is it will take more time comparing each and every page on both domains. It is possible but actually not feasible, IMHO.

    Thanks for your input. ;)

  4. boomer said,

    on May 25th, 2006 at 10:15 am

    Hi
    If I have no pages shown at “Similarity of pages” box it means that my pages
    are absolutely unique?
    Thanks.

  5. Chris said,

    on May 25th, 2006 at 10:18 am

    Hi Sufyan, looks very cool. Your last comment makes me wonder what about Copyscape which does check across the board using the Google API. Is this something you have experimented with?

    Thanks again, Chris


  6. on May 25th, 2006 at 10:22 am

    […] http://www.seojunkie.com/2006/05/24/site-wide-duplicate-content-analyzer/ […]

  7. Sufyan said,

    on May 25th, 2006 at 10:28 am

    Chris

    No, I haven’t yet played with Google API to do the job. I will try to utilize it. Thanks for pointing this out.

    :)

  8. Shorty said,

    on May 25th, 2006 at 11:29 am

    Tried the tool but got a “Runtime error ‘35761′: Request timed out - maybe I need to redownload it and reinstall.
    But, from what I saw it can really help me to create better quality content for my site.
    Thanks for making this tool available.


  9. on May 25th, 2006 at 4:34 pm

    […] Very useful free tool to check your site for duplicate content, which will hurt your rankings. It gives a fairly useful numeric percentage ranking by page, so that you can see potential problem areas. […]

  10. Jessica said,

    on May 25th, 2006 at 4:54 pm

    Is it possible to use this tool to look at certain folders within a website? My site is too large to run the tool on, but I would love to know if I am having problems with this.

    I tried the software and it crashed giving me some “-6″ error.

  11. Alan R said,

    on May 25th, 2006 at 6:02 pm

    I get the following error:
    Run-time error ‘6′

    Overflow

  12. samurai said,

    on May 25th, 2006 at 7:41 pm

    your skin is FAGGY!

    fucking macs

    nice app thoo

  13. corinaw said,

    on May 26th, 2006 at 12:23 am

    Bummer.
    I got the same error as alan r. Is my site too big? (I was comparing 243 pages)

    Run-time error ‘6′

    Overflow

  14. Sufyan said,

    on May 26th, 2006 at 1:31 am

    Hi all,

    As I said above, it is good for smaller websites at the moment. I might integrate Google Api into the tool as Chris suggested to make it faster and support large websites anytime soon.

    For those that are having problem using the tool, I’d suggest that you send an email to sufyaaan AT gmail DOT com with the URL of your website that you ran it on. So, I will see what is causing the problem in the debugging mode and fix it.

    :)


  15. on May 26th, 2006 at 6:07 am

    […] Wat echter nog steeds bestraft word is het vullen van meerdere pagina’s met vrijwel identieke tekst. Deze duplicate content word door sommigen bewust aangebracht in de hoop sneller en uitgebreider geindexeerd te worden. De penalty die Google ervoor uitdeelt kan echter ook mensen straffen die per ongeluk vrijwel identieke teksten hebben. SEOJunkie heeft nu een tooltje geschreven dat je eigen website kan checken voor duplicate content. Voor de download: http://www.seojunkie.com/2006/05/24/site-wide-duplicate-content-analyzer/ […]


  16. on May 26th, 2006 at 5:36 pm

    […] In addition to that tool, Aaron Wall has also discussed ways to see duplicate content. He discussed how SEO Junkie has released a tool to look for duplicate content. However, this is not about duplicate content from OTHER sites, it’s from your own website! It is a tool which you can use to crawl websites and see how many times the same keyword or content is present compared to all the other pages. If both of them are weblogs, you can always check the time stamp of the posts. Of course they could always change them manually, that’s when you should use the cache feature of numerous search engines. Check the main page to see if the content (or post) was present at the given date. […]

  17. Khurram Ali said,

    on May 26th, 2006 at 10:58 pm

    good program if it woks, keeps crashing on me when I run the scan.

  18. Sufyan said,

    on May 27th, 2006 at 2:43 am

    The tool has been updated and a couple bugs removed.


  19. on May 28th, 2006 at 10:33 pm

    […] Here’s a handy duplicate content analysis software tool that crawls your entire site and analyzes all your pages for duplicate content. It returns the similarity percentage for all pages of your site. The higher the percentage, the more likely the page is similar. […]

  20. Darryl said,

    on May 30th, 2006 at 7:08 am

    Hey,

    Great tool. Quick issue. I ran a test on a .com that has external links to its equivalent .ca, .co.uk, and other international sites.

    The tool looked at all the pages. Maybe you could add functionality to the tool where it stays on the target domain… and does not follow links out?

    I like the tool though … good stuff.

  21. Darryl said,

    on May 30th, 2006 at 7:11 am

    Another quick suggestion…. maybe allow a person to stop the crawl and see the results at a certain point.

  22. Lee said,

    on May 31st, 2006 at 5:13 pm

    Hi, Great product - I ran this on a selection of pages and it works very well. I too experienced run time errors while attempting to run the tool across the entire site. Not a good idea considering it seems to be building a cartesian product for all pages.

    Any idea what rough figure we should be aiming at for uniqueness in Google’s eyes? Do you think Google can identify common objects within a page e.g. left nav, header, text ads etc and then genuinely compare the content on offer? A well structured site will always have these common components which will of course reduce the chances of a high uniqueness figure. Cheers.

  23. Sufyan said,

    on May 31st, 2006 at 10:31 pm

    Lee

    I don’t think there is an exact % of uniqueness that Google wants. But, as long as your titles, keywords, descriptions, page copy etc. are unique throughout your entire site, you need not worry.

    And yes, Google is smart enough to better identify decorative elements of a page like nav menu, header etc.than any other search engine. ;)


  24. on June 4th, 2006 at 12:01 am

    […] The higher the similarity, the more … Check it out! […]

  25. Vincent said,

    on June 8th, 2006 at 6:05 am

    Hi
    Great tool and very handy. Found a slight glitch within it. When installing to another drive outside of ‘C-Drive’ for example if I install to ‘F-Drive’ it won’t pick the intended path and reads back errors for path file not being found. Just thought I would let you know.

    Thanks


  26. on June 12th, 2006 at 11:48 am

    […] Duplicate content checker tool […]


  27. on June 15th, 2006 at 11:58 pm

    […] The Site Wide Duplicate Content Analyzer will crawl a given site and show a similarity percentage for the pages. The tool won’t compare pages across different sites or domains, but if you read my previous post on the subject you’ll recall I think the issue with duplicate content is of greater concern for legitimate sites over the one site as when compared to another site. […]

  28. David said,

    on June 16th, 2006 at 2:16 am

    Hi,
    Great tool!
    After one day, the program output for all comparison (for all site) is always “Unknown”?
    Is it a bug? What can I do?

    Thanks!

  29. Sufyan said,

    on June 16th, 2006 at 3:15 am

    David

    It accesses http://www.seojunkie.com to perform similarity check between pages. May be you tried to run it when the site is down for a while? It happens sometimes. Otherwise, too many people might have been using the tool and it couldn’t get accurate result due to overload. Huh? ;)

  30. David said,

    on June 20th, 2006 at 3:17 am

    Hi,
    the problem persist…
    Is possible to have a licence or buy the program?
    I like this software very much and it is very helpfull for my job.
    Thanks!

  31. trung said,

    on June 20th, 2006 at 9:52 pm

    Nice idea,

    keeps on returning unknown for all pages it checks.


  32. on June 23rd, 2006 at 1:23 am

    nice tools

    i teasted it at my personal site www.shimul.info .

    i like it much !!!

    Shimiul


  33. on July 8th, 2006 at 3:43 pm

    Very good idea for a helpful tool.

    With a medium project (my SEO Blog - 650 pages in the Google index - the tool showed more than 3500 pages?) it crashed for me - with a small project (23 pages) I tried it several times, but the program output (%) shows always “unknown”?

    This tool would be very nice for SEO´s, if you would use the Google API and if the tool could check only the sites from the Google site-search (site:yourdomain.com) for duplicate content … ;-)

  34. Judith said,

    on July 18th, 2006 at 3:09 pm

    Same result as many others, Unknown. I’d love for it to work though.


  35. on July 24th, 2006 at 7:44 pm

    […] SEO Junkie’s Duplicate content analyzer software to help you to find duplicate content within your site. It currently doesn’t do net-wide searches, but it looks like the future possibility is there. […]

  36. K said,

    on July 25th, 2006 at 8:33 am

    It doesn’t work…

    Runtime error and unknown.

    Would be great if it did work though…

  37. Win said,

    on July 27th, 2006 at 3:48 pm

    Mine doesn’t work properly. I only get unknow as results…

    Any body any suggestions what to do?


  38. on August 3rd, 2006 at 6:25 pm

    […] What’s your opinion of this sitewide duplicate content analyzer tool?It looks quite good, though I haven’t got time to run it on one of our properties at the moment. As the tool states, it’s NOT designed for big sites, and I fear that people might get the wrong impression about duplicate content with something like this. If 90% of your pages have 80% similarity, that’s acutally perfectly alright. Many times, the only thing unique about pages is a single photo and a few lines of text - the rest of the permanent navigation and template accounts for the great majority of a page’s content. Search engines really don’t care about sites/pages like this - they know that e-commerce and many content sites look this way and ignore the duplicate portions in favor of the unique elements. […]

  39. Pete said,

    on August 4th, 2006 at 7:43 am

    I was just thinking ‘if only there was a tool that could find duplicate content for me’ and low and behold this comes up.

    A huge thank you from a new and aspiring SEO :)

  40. Pete said,

    on August 4th, 2006 at 1:31 pm

    Ok I’m getting the ‘Unknown’ issue like everyone else. I’d really like to see this working properly though :)

  41. Kinglobang said,

    on August 15th, 2006 at 2:13 pm

    hi , this is cool
    bt where can we Download the FREE tool ???

    visit my blog

  42. renold said,

    on August 29th, 2006 at 8:55 am

    how to download the tool

  43. Sufyan said,

    on August 30th, 2006 at 2:07 am

    Renold - The tool is currently offline as I mentioned in the update to the post above.

    UPDATE (8/05/2006)

    This tool is NOT working at the moment as the Web-based PHP script that used to compare the % of duplicate content among pages for it has been disabled by the host since it was causing a high load on the server. It won’t work till I get a dedicated server of my own to host it and update this tool. Thanks!

     I will put it available for downloading as soon as it is done. Thanks for waiting. :)

  44. renold said,

    on August 31st, 2006 at 4:27 am

    Thanks for the reply

  45. smile said,

    on September 7th, 2006 at 3:19 am

    I think it’s a good software. It’s helpful to check my site validity.

  46. patrick said,

    on September 11th, 2006 at 5:18 am

    how can i contact you? i have some questions about
    your analyser tool. i can’t find a email address.

  47. nachev said,

    on September 24th, 2006 at 10:49 am

    How can i download this tool?

  48. nachev said,

    on September 28th, 2006 at 10:01 pm

    I want to download this useful tool to check my website:
    SBS Bulgarian properties Ltd., but i can’t find the download link.

  49. binhaus said,

    on November 1st, 2006 at 10:37 am

    hi love this tools i will waiting for that.
    and i have one question would any one can help.
    i have internal forums in my site like http://www.xaluan.com/modules.php?name=Forums
    but it took loong way to remember and if i put shortcut on hompage it not realy work good. so i created others subdomain like
    http://www.forums.xaluan.com and other is http://www.xaluan.com/forum that point to the long url one..
    i just not sure that will be duble containt that hurt my PR goole ..
    any advice how to solve this ..
    thanks


  50. on November 7th, 2006 at 1:12 pm

    […] “Duplicate content” have become a standard part of the SEO lexicon over the last year or so (2005-2006), and over that time, a handful of common causes have been identified - the most common of which is poor URL handling. There is one cause, legitimately having duplicate pages or very similar pages throughout your site, which is not caused by poor URL handling, and I would recommend SEOJunkie’s tool for determining that type of issue. To note, there is some skepticism (really, skepticism in the SEO world?) as to the actual effect of duplicate content on a sites ranking. I believe that the largest impact occurs through PR dispersion. This occurs when inbound and intra-site links point to various versions of the same page, thus causing the alternate versions to split link power, rather than one single copy enjoying all the link power. […]


  51. on September 4th, 2007 at 10:04 am

    Get rid of your duplicate content…

    What does duplicate content mean, why can it be dangerous for your search engine ranking and how to avoid creating duplicate content?
    There has been a very intensive discussion about duplicate content in the last few years. It has become a big problem …


  52. on January 26th, 2008 at 6:23 pm

    […] The Site Wide Duplicate Content Analyzer will crawl a given site and show a similarity percentage for the pages. The tool won’t compare pages across different sites or domains, but if you read my previous post on the subject you’ll recall I think the issue with duplicate content is of greater concern for legitimate sites over the one site as when compared to another site. […]

Leave a Reply

You must be logged in to post a comment.

Site Wide Duplicate Content Analyzer

Posted in Tools by Sufyan on the May 24th, 2006

This tool crawls your entire site and then analyzes all your pages for duplicate content. It shows similarity percentage among all pages on your site, so you can see what pages are similar enough to trigger a flag in major search engines and consequently they can penalize your site for duplicate content.

The higher the similarity, the more likely that you will zapped by them

Below is its screenshot and I think the tool doesn’t require any how-to as it is quite simple.

 

Download Site Wide Duplicate Content Analyzer (2.29 MB)

If you have any comment, questions or anything else, please let me know.

P.S – This tool is NOT intended for large websites and has a lot that can be improved.

UPDATE (8/05/2006)

This tool is NOT working at the moment as the Web-based PHP script that used to compare the % of duplicate content among pages for it has been disabled by the host since it was causing a high load on the server. It won’t work till I get a dedicated server of my own to host it and update this tool. Thanks!

UPDATE (5/26/2006)

I just updated the tool and fixed “Overflow” errors and the like. I think that it can be used for small to mid-sized websites now.

If you want to tell me your suggestions or experience problems using Duplicate Content Analyzer or have found a bug, please feel free to write to sufyaaan AT gmail DOT com.

52 Responses to 'Site Wide Duplicate Content Analyzer'

Subscribe to comments with RSS or TrackBack to 'Site Wide Duplicate Content Analyzer'.

  1. J-Weezy said,

    on May 25th, 2006 at 8:31 am

    Can you compare one domain to another to check for dup content?

  2. MrWorthing said,

    on May 25th, 2006 at 8:59 am

    This is a very handy tool.

  3. Sufyan said,

    on May 25th, 2006 at 9:51 am

    Weezy

    That’s a cool idea. But, the problem is it will take more time comparing each and every page on both domains. It is possible but actually not feasible, IMHO.

    Thanks for your input. ;)

  4. boomer said,

    on May 25th, 2006 at 10:15 am

    Hi
    If I have no pages shown at “Similarity of pages” box it means that my pages
    are absolutely unique?
    Thanks.

  5. Chris said,

    on May 25th, 2006 at 10:18 am

    Hi Sufyan, looks very cool. Your last comment makes me wonder what about Copyscape which does check across the board using the Google API. Is this something you have experimented with?

    Thanks again, Chris


  6. on May 25th, 2006 at 10:22 am

    […] http://www.seojunkie.com/2006/05/24/site-wide-duplicate-content-analyzer/ […]

  7. Sufyan said,

    on May 25th, 2006 at 10:28 am

    Chris

    No, I haven’t yet played with Google API to do the job. I will try to utilize it. Thanks for pointing this out.

    :)

  8. Shorty said,

    on May 25th, 2006 at 11:29 am

    Tried the tool but got a “Runtime error ‘35761′: Request timed out - maybe I need to redownload it and reinstall.
    But, from what I saw it can really help me to create better quality content for my site.
    Thanks for making this tool available.


  9. on May 25th, 2006 at 4:34 pm

    […] Very useful free tool to check your site for duplicate content, which will hurt your rankings. It gives a fairly useful numeric percentage ranking by page, so that you can see potential problem areas. […]

  10. Jessica said,

    on May 25th, 2006 at 4:54 pm

    Is it possible to use this tool to look at certain folders within a website? My site is too large to run the tool on, but I would love to know if I am having problems with this.

    I tried the software and it crashed giving me some “-6″ error.

  11. Alan R said,

    on May 25th, 2006 at 6:02 pm

    I get the following error:
    Run-time error ‘6′

    Overflow

  12. samurai said,

    on May 25th, 2006 at 7:41 pm

    your skin is FAGGY!

    fucking macs

    nice app thoo

  13. corinaw said,

    on May 26th, 2006 at 12:23 am

    Bummer.
    I got the same error as alan r. Is my site too big? (I was comparing 243 pages)

    Run-time error ‘6′

    Overflow

  14. Sufyan said,

    on May 26th, 2006 at 1:31 am

    Hi all,

    As I said above, it is good for smaller websites at the moment. I might integrate Google Api into the tool as Chris suggested to make it faster and support large websites anytime soon.

    For those that are having problem using the tool, I’d suggest that you send an email to sufyaaan AT gmail DOT com with the URL of your website that you ran it on. So, I will see what is causing the problem in the debugging mode and fix it.

    :)


  15. on May 26th, 2006 at 6:07 am

    […] Wat echter nog steeds bestraft word is het vullen van meerdere pagina’s met vrijwel identieke tekst. Deze duplicate content word door sommigen bewust aangebracht in de hoop sneller en uitgebreider geindexeerd te worden. De penalty die Google ervoor uitdeelt kan echter ook mensen straffen die per ongeluk vrijwel identieke teksten hebben. SEOJunkie heeft nu een tooltje geschreven dat je eigen website kan checken voor duplicate content. Voor de download: http://www.seojunkie.com/2006/05/24/site-wide-duplicate-content-analyzer/ […]


  16. on May 26th, 2006 at 5:36 pm

    […] In addition to that tool, Aaron Wall has also discussed ways to see duplicate content. He discussed how SEO Junkie has released a tool to look for duplicate content. However, this is not about duplicate content from OTHER sites, it’s from your own website! It is a tool which you can use to crawl websites and see how many times the same keyword or content is present compared to all the other pages. If both of them are weblogs, you can always check the time stamp of the posts. Of course they could always change them manually, that’s when you should use the cache feature of numerous search engines. Check the main page to see if the content (or post) was present at the given date. […]

  17. Khurram Ali said,

    on May 26th, 2006 at 10:58 pm

    good program if it woks, keeps crashing on me when I run the scan.

  18. Sufyan said,

    on May 27th, 2006 at 2:43 am

    The tool has been updated and a couple bugs removed.


  19. on May 28th, 2006 at 10:33 pm

    […] Here’s a handy duplicate content analysis software tool that crawls your entire site and analyzes all your pages for duplicate content. It returns the similarity percentage for all pages of your site. The higher the percentage, the more likely the page is similar. […]

  20. Darryl said,

    on May 30th, 2006 at 7:08 am

    Hey,

    Great tool. Quick issue. I ran a test on a .com that has external links to its equivalent .ca, .co.uk, and other international sites.

    The tool looked at all the pages. Maybe you could add functionality to the tool where it stays on the target domain… and does not follow links out?

    I like the tool though … good stuff.

  21. Darryl said,

    on May 30th, 2006 at 7:11 am

    Another quick suggestion…. maybe allow a person to stop the crawl and see the results at a certain point.

  22. Lee said,

    on May 31st, 2006 at 5:13 pm

    Hi, Great product - I ran this on a selection of pages and it works very well. I too experienced run time errors while attempting to run the tool across the entire site. Not a good idea considering it seems to be building a cartesian product for all pages.

    Any idea what rough figure we should be aiming at for uniqueness in Google’s eyes? Do you think Google can identify common objects within a page e.g. left nav, header, text ads etc and then genuinely compare the content on offer? A well structured site will always have these common components which will of course reduce the chances of a high uniqueness figure. Cheers.

  23. Sufyan said,

    on May 31st, 2006 at 10:31 pm

    Lee

    I don’t think there is an exact % of uniqueness that Google wants. But, as long as your titles, keywords, descriptions, page copy etc. are unique throughout your entire site, you need not worry.

    And yes, Google is smart enough to better identify decorative elements of a page like nav menu, header etc.than any other search engine. ;)


  24. on June 4th, 2006 at 12:01 am

    […] The higher the similarity, the more … Check it out! […]

  25. Vincent said,

    on June 8th, 2006 at 6:05 am

    Hi
    Great tool and very handy. Found a slight glitch within it. When installing to another drive outside of ‘C-Drive’ for example if I install to ‘F-Drive’ it won’t pick the intended path and reads back errors for path file not being found. Just thought I would let you know.

    Thanks


  26. on June 12th, 2006 at 11:48 am

    […] Duplicate content checker tool […]


  27. on June 15th, 2006 at 11:58 pm

    […] The Site Wide Duplicate Content Analyzer will crawl a given site and show a similarity percentage for the pages. The tool won’t compare pages across different sites or domains, but if you read my previous post on the subject you’ll recall I think the issue with duplicate content is of greater concern for legitimate sites over the one site as when compared to another site. […]

  28. David said,

    on June 16th, 2006 at 2:16 am

    Hi,
    Great tool!
    After one day, the program output for all comparison (for all site) is always “Unknown”?
    Is it a bug? What can I do?

    Thanks!

  29. Sufyan said,

    on June 16th, 2006 at 3:15 am

    David

    It accesses http://www.seojunkie.com to perform similarity check between pages. May be you tried to run it when the site is down for a while? It happens sometimes. Otherwise, too many people might have been using the tool and it couldn’t get accurate result due to overload. Huh? ;)

  30. David said,

    on June 20th, 2006 at 3:17 am

    Hi,
    the problem persist…
    Is possible to have a licence or buy the program?
    I like this software very much and it is very helpfull for my job.
    Thanks!

  31. trung said,

    on June 20th, 2006 at 9:52 pm

    Nice idea,

    keeps on returning unknown for all pages it checks.


  32. on June 23rd, 2006 at 1:23 am

    nice tools

    i teasted it at my personal site www.shimul.info .

    i like it much !!!

    Shimiul


  33. on July 8th, 2006 at 3:43 pm

    Very good idea for a helpful tool.

    With a medium project (my SEO Blog - 650 pages in the Google index - the tool showed more than 3500 pages?) it crashed for me - with a small project (23 pages) I tried it several times, but the program output (%) shows always “unknown”?

    This tool would be very nice for SEO´s, if you would use the Google API and if the tool could check only the sites from the Google site-search (site:yourdomain.com) for duplicate content … ;-)

  34. Judith said,

    on July 18th, 2006 at 3:09 pm

    Same result as many others, Unknown. I’d love for it to work though.


  35. on July 24th, 2006 at 7:44 pm

    […] SEO Junkie’s Duplicate content analyzer software to help you to find duplicate content within your site. It currently doesn’t do net-wide searches, but it looks like the future possibility is there. […]

  36. K said,

    on July 25th, 2006 at 8:33 am

    It doesn’t work…

    Runtime error and unknown.

    Would be great if it did work though…

  37. Win said,

    on July 27th, 2006 at 3:48 pm

    Mine doesn’t work properly. I only get unknow as results…

    Any body any suggestions what to do?


  38. on August 3rd, 2006 at 6:25 pm

    […] What’s your opinion of this sitewide duplicate content analyzer tool?It looks quite good, though I haven’t got time to run it on one of our properties at the moment. As the tool states, it’s NOT designed for big sites, and I fear that people might get the wrong impression about duplicate content with something like this. If 90% of your pages have 80% similarity, that’s acutally perfectly alright. Many times, the only thing unique about pages is a single photo and a few lines of text - the rest of the permanent navigation and template accounts for the great majority of a page’s content. Search engines really don’t care about sites/pages like this - they know that e-commerce and many content sites look this way and ignore the duplicate portions in favor of the unique elements. […]

  39. Pete said,

    on August 4th, 2006 at 7:43 am

    I was just thinking ‘if only there was a tool that could find duplicate content for me’ and low and behold this comes up.

    A huge thank you from a new and aspiring SEO :)

  40. Pete said,

    on August 4th, 2006 at 1:31 pm

    Ok I’m getting the ‘Unknown’ issue like everyone else. I’d really like to see this working properly though :)

  41. Kinglobang said,

    on August 15th, 2006 at 2:13 pm

    hi , this is cool
    bt where can we Download the FREE tool ???

    visit my blog

  42. renold said,

    on August 29th, 2006 at 8:55 am

    how to download the tool

  43. Sufyan said,

    on August 30th, 2006 at 2:07 am

    Renold - The tool is currently offline as I mentioned in the update to the post above.

    UPDATE (8/05/2006)

    This tool is NOT working at the moment as the Web-based PHP script that used to compare the % of duplicate content among pages for it has been disabled by the host since it was causing a high load on the server. It won’t work till I get a dedicated server of my own to host it and update this tool. Thanks!

     I will put it available for downloading as soon as it is done. Thanks for waiting. :)

  44. renold said,

    on August 31st, 2006 at 4:27 am

    Thanks for the reply

  45. smile said,

    on September 7th, 2006 at 3:19 am

    I think it’s a good software. It’s helpful to check my site validity.

  46. patrick said,

    on September 11th, 2006 at 5:18 am

    how can i contact you? i have some questions about
    your analyser tool. i can’t find a email address.

  47. nachev said,

    on September 24th, 2006 at 10:49 am

    How can i download this tool?

  48. nachev said,

    on September 28th, 2006 at 10:01 pm

    I want to download this useful tool to check my website:
    SBS Bulgarian properties Ltd., but i can’t find the download link.

  49. binhaus said,

    on November 1st, 2006 at 10:37 am

    hi love this tools i will waiting for that.
    and i have one question would any one can help.
    i have internal forums in my site like http://www.xaluan.com/modules.php?name=Forums
    but it took loong way to remember and if i put shortcut on hompage it not realy work good. so i created others subdomain like
    http://www.forums.xaluan.com and other is http://www.xaluan.com/forum that point to the long url one..
    i just not sure that will be duble containt that hurt my PR goole ..
    any advice how to solve this ..
    thanks


  50. on November 7th, 2006 at 1:12 pm

    […] “Duplicate content” have become a standard part of the SEO lexicon over the last year or so (2005-2006), and over that time, a handful of common causes have been identified - the most common of which is poor URL handling. There is one cause, legitimately having duplicate pages or very similar pages throughout your site, which is not caused by poor URL handling, and I would recommend SEOJunkie’s tool for determining that type of issue. To note, there is some skepticism (really, skepticism in the SEO world?) as to the actual effect of duplicate content on a sites ranking. I believe that the largest impact occurs through PR dispersion. This occurs when inbound and intra-site links point to various versions of the same page, thus causing the alternate versions to split link power, rather than one single copy enjoying all the link power. […]


  51. on September 4th, 2007 at 10:04 am

    Get rid of your duplicate content…

    What does duplicate content mean, why can it be dangerous for your search engine ranking and how to avoid creating duplicate content?
    There has been a very intensive discussion about duplicate content in the last few years. It has become a big problem …


  52. on January 26th, 2008 at 6:23 pm

    […] The Site Wide Duplicate Content Analyzer will crawl a given site and show a similarity percentage for the pages. The tool won’t compare pages across different sites or domains, but if you read my previous post on the subject you’ll recall I think the issue with duplicate content is of greater concern for legitimate sites over the one site as when compared to another site. […]

Leave a Reply

You must be logged in to post a comment.