Why Partial Feeds Will NOT Stop People From Stealing Your Content
I have managed to refrain from writing on this topic approximately 467 times that I've seen it come up on other sites. I guess I'm just too weak to hold out any longer.
I'm not going to rehash the same tired arguments. You can ask Google to do that for you. However, I want to address what seems to be the single biggest argument of people refusing to put anything more than a sentence or two in their RSS feeds:
If I put full-text feeds out, people will steal my content and put it up on their site.
Here's the deal, folks. If you are putting out a feed that doesn't include the actual article, you're doing, well, nothing to actually stop someone from grabbing the content automatically and putting it somewhere.
Basically, if you're out to slurp up RSS feeds and post them somewhere else, your pseudo-code looks something like this:
- Fetch feed.
- Split out each post into its content.
- Post the article on new site.
Geez. That's easy, goes the typical reaction (and it is). However, if you think that pulling out the second half of step 2 (aha, if we remove the content, they can't repost it), all you've done is add one step to the process.
- Fetch feed.
- Split out each post.
- Fetch the content from the linked page.
- Post the article on new site.
And, lest you think that the extra step is somehow difficult, here's a quick example of how to get the text of a post from this site (note that this site offers not only full-text feeds, but the content of those feeds is licensed under the Creative Commons Attribution license, so it's not necessary). This only gets the plain text, without HTML tags, links, etc., but it proves the point of just how easy it is to extract the content.
<?php error_reporting(0); $url = "http://www.wynia.org/wordpress/2007/01/06/i- want-to-have-been-able-to-get-it-on-my-own/"; $doc = new DOMDocument(); $doc->loadHTMLFile($url); $tags = $doc->getElementsByTagName('div'); foreach ($tags as $tag) { $content = $tag->nodeValue; $class = $tag->getAttribute('class'); if($class == "entry"){ $output = $content; } } file_put_contents("output.html",$output); ?>
All I had to do was take a quick peek at my HTML to get what the element that contains the actual post content and the rest was pretty straightforward.
Now, if I were out to pull content from sites and re-post it, you can be pretty sure that I'd only have to spend a day or so writing filters for the sites I'd be targeting and there'd be no difference to me.
Now, I'm not advocating copyright infringement and am not insisting that anyone else change what they do. I'm just thoroughly sick of hearing this stated as the *reason* for doing it. It's a crutch and it's just not true.
And, yes, I do feel better.

January 7th, 2007 at 8:06 pm
Hey, this is not as bad as back in the day when some sites would disable right-click. That level of ignorance made my mouse hand bleed on the inside.
All this DRM stuff is simply an annoyance. If there's a way to actually display media, there's a way to record it.
January 7th, 2007 at 8:32 pm
The right click thing still pops up from time to time. It's particularly irritating in modern tabbed browsers. Somehow my attempt to launch a link in a new tab is going to infringe on their text copyright. Of course, the copy in my browser cache is somehow only in my imagination.
January 7th, 2007 at 9:12 pm
Thank you!
I've been scraping HTML pages for quite some time (for good, not evil). Using a partial feed to stop content stealing not only doesn't help, but it hurts you by not allowing people to effectively subscribe to your feed, which hurts readership.
January 9th, 2007 at 6:52 am
Yeah, I know that people, if they are dedicated enough, will steal my content no matter what. The reason I use partial feeds is because I want my readers to read the rest of the article.
January 9th, 2007 at 8:58 pm
Kyle, that makes absolutely no sense. If you produce full feeds, the RSS readers already *have* the complete article. There is no "rest" for them to read.