Excerpt vs. Full vs. the Glass Too Big Approach

Originally published: 12/2005 by J Wynia

First, I want to apologize to those of you reading via Bloglines for the big pile of messages that just showed up as unread. Bloglines takes an approach to read/unread that makes what I'm trying to do difficult without the results you're seeing.

For every post in an RSS feed, most aggregators use a field called "guid", which is unique for each item. If you've read that item, they consider it read and don't mark it unread unless you specifically say so. Bloglines, on the other hand, appears to take the approach that if ANYTHING in the item has changed, it must be read again. So, the earlier footer message I had, pointing to my other sites caused everything to pop up unread whenever the message changed.

Well, yesterday I put in another change that is going to make that happen again and I apologize. However, it's there for a good reason. I keep seeing people claiming unattributed theft of content as their reason for only publishing partial feeds. You know the ones where they only give you a paragraph in the feed and require you to visit the site to get the rest of the article?

They see it as a choice between giving the full version, which is what most readers want and the excerpt because of the few places republishing their content elsewhere. To me, this just seems to be a scorched earth approach where something more subtle could work.

What I've added to my feeds is a single line at the bottom that shows what IP address made the request. I may have to go to something more unique if aggregators are using multiple IP's to request content. If you're seeing that via Bloglines or another service, please let me know. My intent over the next few weeks is to build a whitelist of the major services like Bloglines so that they don't see these messages at all.

For the rest of the web, I can then track where the request for the content that's being used inappropriately came from. The feed script will then check the requesting IP against both a whitelist and a blacklist. The whitelist will see no message at all, the blacklist will see a message warning readers that the feed is being used inappropriately and everyone else will just see the little note about requesting IP.

Then, all I have to do (or more accurately, people using the resulting code) is add the offending IP addresses to a file or database table and specify what those feeds should contain and the offending site goes away.

This technique has been used for those who steal images for a long time. Simply replace the real image with "This image was stolen from ABC site" and the problem is taken care of.

Incidentally, the IP address that Bloglines requested for my viewing is: 64.34.162.9. Does that match other Bloglines users' entry?

Comments

Billy on 12/19/2005
I wish I could follow this, sounds like something I'd be interested in but I can't get in thru my noggin, sorry for my ignorance lol. Other than, yes, I did recv all the msgs again via Bloglines, but no biggie.
J Wynia on 12/19/2005
Basically it comes down to the fact that Bloglines is unique in the whole repeated entries thing, but the new stuff I'm doing to combat other sites using the feed content without playing by the rules will also benefit Bloglines users as if I get the right set of IP addresses named as being "Bloglines", I can exclude them from some of these changes to prevent the problems.
Billy on 12/19/2005
crystal, I'll try to catch them next time, thx
Sam on 12/19/2005
Bloglines is showing me "Requested by: 216.148.212.188" I always wondered why your entire feed was showing up every time you posted something new. I suppose I'll change my prefs to ignore your updates. Should have thought of that before.
Billy on 12/19/2005
okay, smokey crystal... Where do I go in BL to see the "Requested by.." statement? TIA
J Wynia on 12/19/2005
OK, clearly Bloglines is doing some widespread network fetching of feeds. Between Sam's and fetches this morning and after work, I'm getting IP addresses that map to: server2.broadwords.com www.syndic8.com cr03.bloglines.com All being used by Bloglines to fetch the feed's contents. Getting a decent list is going to take some time.
J Wynia on 12/19/2005
The Requested by is actually the last line of the content itself.
Kean on 12/20/2005
J, FYI this caused a hiccup with Google Reader as well. (Google Reader shows... Requested by: 64.78.155.100)
Ricky Spears on 12/20/2005
My bloglines IP said: 65.214.39.151 Good luck with the project!
blog comments powered by Disqus
Or, browse the archives.
© 2003- 2013 J Wynia. Very Few Rights Reserved. This article is licensed under the terms of the Creative Commons Attribution License. Quoted content or content included from others is not subject to that license and defaults to normal copyright.