Why Partial Feeds Will NOT Stop People From Stealing Your Content

Jan
07
2007

I have managed to refrain from writing on this topic approximately 467 times that I've seen it come up on other sites. I guess I'm just too weak to hold out any longer.

I'm not going to rehash the same tired arguments. You can ask Google to do that for you. However, I want to address what seems to be the single biggest argument of people refusing to put anything more than a sentence or two in their RSS feeds:

If I put full-text feeds out, people will steal my content and put it up on their site.

Here's the deal, folks. If you are putting out a feed that doesn't include the actual article, you're doing, well, nothing to actually stop someone from grabbing the content automatically and putting it somewhere.

Basically, if you're out to slurp up RSS feeds and post them somewhere else, your pseudo-code looks something like this:

  1. Fetch feed.
  2. Split out each post into its content.
  3. Post the article on new site.

Geez. That's easy, goes the typical reaction (and it is). However, if you think that pulling out the second half of step 2 (aha, if we remove the content, they can't repost it), all you've done is add one step to the process.

  1. Fetch feed.
  2. Split out each post.
  3. Fetch the content from the linked page.
  4. Post the article on new site.

And, lest you think that the extra step is somehow difficult, here's a quick example of how to get the text of a post from this site (note that this site offers not only full-text feeds, but the content of those feeds is licensed under the Creative Commons Attribution license, so it's not necessary). This only gets the plain text, without HTML tags, links, etc., but it proves the point of just how easy it is to extract the content.

<?php
error_reporting(0);

$url = "http://www.wynia.org/wordpress/2007/01/06/i-
want-to-have-been-able-to-get-it-on-my-own/";

$doc = new DOMDocument();
$doc->loadHTMLFile($url);

$tags = $doc->getElementsByTagName('div');
foreach ($tags as $tag) {

  $content = $tag->nodeValue;
  $class = $tag->getAttribute('class');

  if($class == "entry"){
    $output = $content;

  }
}
file_put_contents("output.html",$output);
?>

All I had to do was take a quick peek at my HTML to get what the element that contains the actual post content and the rest was pretty straightforward.

Now, if I were out to pull content from sites and re-post it, you can be pretty sure that I'd only have to spend a day or so writing filters for the sites I'd be targeting and there'd be no difference to me.

Now, I'm not advocating copyright infringement and am not insisting that anyone else change what they do. I'm just thoroughly sick of hearing this stated as the *reason* for doing it. It's a crutch and it's just not true.

And, yes, I do feel better.

 

Comments on this post

Feedback is always welcome. Read some from other folks or leave your own below. Just keep things civil and remember that what you post lives on in public. Forever.

Thanks,
J

5 Responses to “Why Partial Feeds Will NOT Stop People From Stealing Your Content”

  1. Tony Says:

    Hey, this is not as bad as back in the day when some sites would disable right-click. That level of ignorance made my mouse hand bleed on the inside.

    All this DRM stuff is simply an annoyance. If there's a way to actually display media, there's a way to record it.

  2. J Wynia Says:

    The right click thing still pops up from time to time. It's particularly irritating in modern tabbed browsers. Somehow my attempt to launch a link in a new tab is going to infringe on their text copyright. Of course, the copy in my browser cache is somehow only in my imagination.

  3. Justin Kistner Says:

    Thank you!

    I've been scraping HTML pages for quite some time (for good, not evil). Using a partial feed to stop content stealing not only doesn't help, but it hurts you by not allowing people to effectively subscribe to your feed, which hurts readership.

  4. Kyle Korleski Says:

    Yeah, I know that people, if they are dedicated enough, will steal my content no matter what. The reason I use partial feeds is because I want my readers to read the rest of the article.

  5. J Wynia Says:

    Kyle, that makes absolutely no sense. If you produce full feeds, the RSS readers already *have* the complete article. There is no "rest" for them to read.

Leave Your Own Comment

By submitting a comment, you agree to license it under the terms of the Creative Commons Attribution license.

People who post comments get the added benefit of visiting the site without advertising.

© 2003-2009 J Wynia. All original content is licensed under the terms of the Creative Commons Attribution license unless otherwise noted. Content from other sources is licensed under its original terms.