The Glass is Too Big - Home

RSS2Email Extending, RSS Analysis and Yahoo Keyword Extractor Example Fixed

Originally published on: 11/3/2005 6:46:35 PM

When you start looking at RSS as a giant pile of feeds and items from which to pull and filter, The proliferation of RSS feeds, web service API's and just plain PHP/MySQL duct tape, there isn't much you can't do to get exactly what you want out of it.

So, while trying to get my mind off of what was stopping forward motion for Thomas Barkett's story line (turns out it was the flight I put him on instead of having him drive: a detour that's put me behind on word count), I strung together a few previously existing experiments to push the idea of RSS via IMAP a little further as a view. I also discovered that Wordpress is another in a long series of tools that royally #$%^&'ed up my intentions in an effort to be "helpful" and broke the Yahoo keyword extraction posting.



I discovered the broken page when I went to my own posting to grab the code. Wordpress attempted to mess with the "result" tags that were manipulated in the sample PHP by making the whole posting XHTML compliant, including those tags. So, I reencoded the sample code and it should be better now. Sorry to anyone who got frustrated with it.

I was grabbing that code because I wanted to start injecting metadata into the headers of the emails that the RSS2Email script sends out. By putting things like keywords into X-Headers, the sophisticated IMAP clients like Outlook and Thunderbird can filter, sort and extract emails according to purpose. Eventually, I intend to tag feeds with the names of blogs I write on where the feed is relevant. That way, I can open a saved filter for this site and exclude all of the stuff that's just for my personal edification. Additionally, if you've got things like Technorati rank for a site, Google PageRank, a manual credibility tag, etc. on the feeds themselves, you can filter through a breaking story to see whether the people you put your credence in have weighed in on the issue or not. Eventually, I'm going to be putting a database between the fetching of the feeds and the sending out of email as it's a much better system on bandwidth, decoupling the refreshing of feeds from the sending of the emails, etc. and just plain a better idea, but for a proof of concept, leaks are OK.

I decided to add the Yahoo keywords first because I'd already written the code. The result is a header in each email for the Yahoo keywords.



I used the code from the earlier article as a function and called it to get a keyword string:



$yahoo_keywords = trim(fetch_yahoo_keywords($description));



I then added it to the RSS2Email script as an X-Header:



$headers = "X-Yahoo-Keywords: $yahoo_keywords \n";\



$headers is just a string of all of your custom headers concatenated together. Each needs to be on it's own line (hence the "\n" newline) with the name followed by a colon and the value. Within the email setup itself (where we add the senders, etc.), an extra portion to add that string to the headers.



$mail->AddCustomHeader($headers);



That's really it. From here on out, it's just a matter of adding more and more analysis of the items, including some of this stuff in the body itself as well as getting it all into a database so that the email isn't the only record of the analyzed content. I'd really like to also be able to pull a newspaper view for each blog or do other analysis of story trends. I might start taking the bus again this winter and those would be useful for the ride.

After November, I'm thinking of putting a machine at home to work just digging through feeds 24 hrs a day and doing analysis. Syndic8 had (looks like a domain name prospecting page now) has a downloadable list of feeds I grabbed at one point that should prove interesting to start harvesting. I want a huge data store to be able to start doing data mining on.

rss, rss2email, yahoo keyword extractor, rss analysis

Comments

Roy
commented on 11/4/2005
Typo alert! http://www.syndic8.com is still up and running.
J Wynia
commented on 11/4/2005
Thanks for the correction. I'll change it in the article, but leave the "my bad" up as well. Freaking "c".
blog comments powered by Disqus
© 2003- 2010 J Wynia. Very Few Rights Reserved.