When I look at an OPML file, I often want to sample it: to see what the sites included are like. I don't want to just import the whole thing, sight unseen into my feeds, or to visit each site, one at a time.
What I wanted was a way to see a "good" post from each feed to get a quick feel for it. So, when I couldn't sleep last night (not feeling particularly well), I kept my mind busy writing a solution. You can see a sample of the output, which shows you the sampler for the Web 2.0 Workgroup's OPML.
This one's quite a bit of code, so, you'll just have to download the whole zip file instead of me posting all of the code. What follows is how to put this on your own site and a quick overview of what's going on.
To follow along at home, you need to download the zip file and put it on your server. Make sure the directory is writable. Also make sure you've got an "appid" from the Yahoo Developer network. The code was tested on PHP5 on Linux and any use is at your own risk. And, you'll need a copy of MagpieRSS. If you've been following many of my RSS scripts, you'll have a central copy of it for your site to avoid having to install it over and over again.
So, what's exactly going on? We'll take an OPML file, load a snapshot of each feed's current contents. Then, we'll loop through each feed item and ask Yahoo how many pages link to it. For each feed, we'll extract the one item that is most linked to (aka the equiv of Google's pagerank for choosing the top page for a search). That "top" post will be put on our sampler page as the example of of the feed's content. The sample page will then contain one item from each feed as well as some information about the feed itself.
A few caveats before we go much further. There is caching in the script, but you're still going to be not only fetching every item in every feed, but also hitting the Yahoo API once for each item as well. Yahoo limits access to 5000 hits per day per IP address. That sounds like a lot, but consider my personal feedlist OPML. 250 feeds with 10 items each puts just one pass through at 2500 queries even with the once per day caching in place. As such, this script has the potential to get your IP banned pretty quickly. So, with great power, comes great responsibility. If you decide to mess with this, BE CAREFUL. I will not be held responsible if you feed 1000 feeds into this thing and Yahoo smacks you down. I'd personally recommend saving the output as a static HTML file and posting that wherever you'd like your page to go instead of anything actualy hitting it live.
OK, enough of that, lets build something.
What you need to change is the index.php file and the yahoo.php file. Inside the index.php file, you change the OPML URL. The one included is the same as the one from the sample page. Then, on line 23, change "REPLACEWITHYOURS" to your Yahoo API key. If you email me with a complaint that this doesn't work and you left that key in place, I reserve the right to mock you publicly.
Also in the ZIP file, you'll see a xml.php file. This is the general XML library I use to dig through the XML Yahoo sends back. You don't need to do anything with that file.
That's pretty much it for just getting it to work. I tried to comment liberally, but consider that much of this was written overnight while I wasn't able to sleep.
[Edit: Looks like there's a bug I need to fix (WHAT? You mean there's a bug in midnight code?) Basically, if none of the items in a given feed have any inbound links in Yahoo, the item from the previous feed will still be in the queue and will be used. All that needs to be added is to make sure that the first one in a feed is put into the spot. I'll get to it as soon as I get a chance.]
Since The XML library is under the a license similar to the Artistic license. All of this is too, because I like the freedoms in his license and though I first misread it, it conveys my intentions anyway.but that could be swapped out and the concept still used.