“data scrappers” can’t WordPress do anything?

  • Unknown's avatar

    I am tired of these websites, mostly based on a WordPress but having a indendent server, stealing my entries and putting them on their website; they take the entry, put it on their site, contribute it to somebody else, do a link back to the WordPress site they stole it from and they do this without permission. I only found out because of the spam alert, trying to find a contact on these sites is next to nil, writing to the server does not work either, they don’t care.
    Is there anything we can do to keep these scrappers from stealing our entries? Is there anything WordPress can do to keep us protected from these stealers?
    One would figure that WordPress would want to help us protect our entries from these people who steal our entries, which do have a copyright…so why doesn’t WordPress have something to protect our sites and entries?

  • We’d love to stop them. Any suggestions as to how?

  • Unknown's avatar

    I have no idea how to stop it, that’s why I was asking the question. I simply wanted to know if there was a way to stop it and if WordPress had a way or knew a way to stop it. I’ve never had this problem before, in all the time I’ve been doing online blogs or journals, it only started happening when I came to WordPress, though I’m sure it happens to other sites too.
    So, I guess there’s no way to stop it, eh?

  • Unknown's avatar

    Anything that is put onto the web can be scraped. Once someone goes to a page on your blog, or anywhere on the web, everything on that page is uploaded to the cache on their computer.

    The only way to stop scrapers from getting your material, is to not publish anything on the web. You can also mark your blog as private so that only those you choose can get to your material.

    You can minimize it by setting your RSS feed to display only one or two posts, and also set it to show a summary only (options > reading).

  • tsp is correct: there are no effective ways to prevent copying. There are ineffective ways, but they usually serve only to annoy your audience. Just ask the RIAA.

    Setting your RSS feed to summary only might help in some cases, if that’s how the scrapers are stealing your posts.

  • Unknown's avatar

    This is the problem with RSS feeds, they are useful for genuine readers but they are also very useful to content stealers, and WordPress doesn’t give you the option to turn them off. One thing they could do is give you the option of adding a copyright notice to all the posts in your feed. It won’t stop people republishing them but at least everyone would know the original source.

    If they can’t do that, then you could always copy and paste a copyright notice on every post yourself. If the scraper blog has Google ads, and most of them do, you could also try reporting them to Google.

  • Unknown's avatar

    The copyright notice thing will probably show up in the future I suspect. It mistakenly showed up (temporarily by mistake) in the “enhanced feeds” section a while back.

  • Unknown's avatar

    Does post refer to Trackbacks and Pingbacks? Is that part of the problem?

  • Unknown's avatar

    I can tell you from personal experience that the “feed copyright” actually doesn’t stop the blog scrapers, but it does help find them :) They still manage to steal your content if they think it will get them traffic to serve up their ads.

    Trent

  • Unknown's avatar

    Trent, I agree. I’ve had my entire blog ripped off, including copyright notices and Copyscape logo. A determined thief will steal.

  • Unknown's avatar

    I think I’ll try the suggestion about the options thing, that sounds like it might be a best way to go, that is until something can be done about these scumscrapers. I was went to change it and I noticed that there was this statement:

    Note: If you use the <!–more–> feature, it will cut off posts in RSS feeds.

    I don’t mean to sound like an idiot, but what does that mean and where do you put the <!–more–> statement and what does it do if you use it?

  • Unknown's avatar

    Hi. I am adding to the issue with my observation too. Over time, I have noticed that WordPress.com’s enthusiastic action to block those sites which violate its terms of contract means there is a shift to self-hosted blogs using WordPress.org stuff.

    See this site which has been lifting my posts:
    http://nokia-n95.thegeekyblog.com/

    Technically this means that we should first complain to the ISP but WP.org could also choose not to license the template etc to them. They could however continue to host using some other template, and anyway what is to say they will not steal WP templates!

    I think we should create a repository to expose these stealer sites, complete with names of ISP and threaten a joint action.

    Can WordPress help us?

    Thanks.

    PS: That <–more.–> thing means if your content is syndicated, as one of my blogs is, they cannot read the whole thing and it creates mega mess.

  • The scrapers ignore content licenses, why would they pay attention to a theme’s license? (Never mind that the theme authors decide what their licenses are, not Automattic).

    Only the copyright holder can legally take action against an infringer. We can’t do that on your behalf.

    Thanks for the suggestions, I’m looking into some ideas.

  • Unknown's avatar

    “Note: If you use the <!–more–> feature, it will cut off posts in RSS feeds.”

    This is no longer true.

  • Unknown's avatar

    You mean since the upgrade this has changed again?

  • Unknown's avatar

    This thread discusses the change in the More tag’s effect on feeds: https://en.forums.wordpress.com/topic.php?id=21924&replies=19. The change occurred in February.

    As of last night, the More tag did not prevent feeds from sending the whole posts. Also, the choice to use a “summary” of your post in the feed doesn’t seem to do anything. (This is suggested as another way to thwarting scrapers.)

  • The topic ‘“data scrappers” can’t WordPress do anything?’ is closed to new replies.