OPML Sampling: Build a Page Showing the "Best" Item From Each RSS Feed

Originally published: 11/2005 by J Wynia

OPML files are a pretty general tool. However, one of their most common uses is to distribute a list of RSS feeds. I've got mine up (contains something like 200-250 feeds). They're also popping up in blogging networks and as a way to share groups of related sites.

When I look at an OPML file, I often want to sample it: to see what the sites included are like. I don't want to just import the whole thing, sight unseen into my feeds, or to visit each site, one at a time.

What I wanted was a way to see a "good" post from each feed to get a quick feel for it. So, when I couldn't sleep last night (not feeling particularly well), I kept my mind busy writing a solution. You can see a sample of the output, which shows you the sampler for the Web 2.0 Workgroup's OPML.

This one's quite a bit of code, so, you'll just have to download the whole zip file instead of me posting all of the code. What follows is how to put this on your own site and a quick overview of what's going on.

To follow along at home, you need to download the zip file and put it on your server. Make sure the directory is writable. Also make sure you've got an "appid" from the Yahoo Developer network. The code was tested on PHP5 on Linux and any use is at your own risk. And, you'll need a copy of MagpieRSS. If you've been following many of my RSS scripts, you'll have a central copy of it for your site to avoid having to install it over and over again.

So, what's exactly going on? We'll take an OPML file, load a snapshot of each feed's current contents. Then, we'll loop through each feed item and ask Yahoo how many pages link to it. For each feed, we'll extract the one item that is most linked to (aka the equiv of Google's pagerank for choosing the top page for a search). That "top" post will be put on our sampler page as the example of of the feed's content. The sample page will then contain one item from each feed as well as some information about the feed itself.

A few caveats before we go much further. There is caching in the script, but you're still going to be not only fetching every item in every feed, but also hitting the Yahoo API once for each item as well. Yahoo limits access to 5000 hits per day per IP address. That sounds like a lot, but consider my personal feedlist OPML. 250 feeds with 10 items each puts just one pass through at 2500 queries even with the once per day caching in place. As such, this script has the potential to get your IP banned pretty quickly. So, with great power, comes great responsibility. If you decide to mess with this, BE CAREFUL. I will not be held responsible if you feed 1000 feeds into this thing and Yahoo smacks you down. I'd personally recommend saving the output as a static HTML file and posting that wherever you'd like your page to go instead of anything actualy hitting it live.

OK, enough of that, lets build something.

What you need to change is the index.php file and the yahoo.php file. Inside the index.php file, you change the OPML URL. The one included is the same as the one from the sample page. Then, on line 23, change "REPLACEWITHYOURS" to your Yahoo API key. If you email me with a complaint that this doesn't work and you left that key in place, I reserve the right to mock you publicly.

Also in the ZIP file, you'll see a xml.php file. This is the general XML library I use to dig through the XML Yahoo sends back. You don't need to do anything with that file.

That's pretty much it for just getting it to work. I tried to comment liberally, but consider that much of this was written overnight while I wasn't able to sleep.

Since The XML library is under the a license similar to the Artistic license. All of this is too, because I like the freedoms in his license and though I first misread it, it conveys my intentions anyway.but that could be swapped out and the concept still used.

[Edit: Looks like there's a bug I need to fix (WHAT? You mean there's a bug in midnight code?) Basically, if none of the items in a given feed have any inbound links in Yahoo, the item from the previous feed will still be in the queue and will be used. All that needs to be added is to make sure that the first one in a feed is put into the spot. I'll get to it as soon as I get a chance.]

RSS, OPML, PHP, scripting, programming, yahoo api, web 2.0

Comments

» Build a Preview Page of Your Empire: OPML Sampler - Empire of Blog on 11/25/2005
[...] I wrote up a script using OPML to present a sample posting from each feed in the OPML file. So, what the script does is finds the post within each RSS feed that has the most other pages in Yahoo’s search linking to it. As a general rule, it will therefore show the most “interesting” post most of the time, automatically. [...]
CrunchNotes » Mashing up OPML and Yahoo’s Search API on 11/25/2005
[...] J Wynia has done something interesting with our Web 2.0 Workgroup OPML file: He’s taken the individual feeds and is running feed posts through Yahoo’s search API to find the most interesting stuff. The end result is here. His overview of what he’s created is here: So, what’s exactly going on? We’ll take an OPML file, load a snapshot of each feed’s current contents. Then, we’ll loop through each feed item and ask Yahoo how many pages link to it. For each feed, we’ll extract the one item that is most linked to (aka the equiv of Google’s pagerank for choosing the top page for a search). That “top” post will be put on our sampler page as the example of of the feed’s content. The sample page will then contain one item from each feed as well as some information about the feed itself. [...]
sahadeva on 11/25/2005
nice work. Sorta like what I'd envisioned here just a few days ago.
Keith Devens on 11/26/2005
> Since the XML library is under the Artistic license, all of this is too, but that could be swapped out and the concept still used. Note that it's not under the Artistic license, though my license is similar. And, that's not how it works. You can do whatever you want with the code and it doesn't impact the license you release your own code under (except for your code that's specifically an addition to the library).
packet filter » Blog Archive » OPML sampling on 11/26/2005
[...] OPML Sampling: Build a Page Showing the “Best” Item From Each RSS Feed– The Glass is Too Big - J Wynia [...]
J Wynia on 11/26/2005
Sorry for the confusion Keith, must have misread the "similar to". I'll clarify the text, but my intent was to just make this as free as I was allowed to, so I"ll still put it all under your license. I usually put my stuff under BSD or something similar.
ekepler@x-forms.com on 11/26/2005
Great Job Jay.
Toni Schneider’s Blog » Feed sharing on 11/26/2005
[...] J Wynia has created an “OPML sampler“. It creates a nice summary view of someone’s list of RSS subscriptions. Here’s an example. This feels like a great basis for letting people share their RSS subcription lists. Instead of today’s process of having to import someone’s OPML file, subscribe to all their feeds, read the feeds and unsubscribe from the ones I don’t like, I’d like to be able to view a Wynia style summary view of someone’s OPML file and click to subscribe to the ones that look interesting. [...]
Conversion Rater on 11/26/2005
A Great Use of OPML and the Yahoo API J Wynia from the Web 2.0 Workgroup has used OPML to create a dynamic “best of” list for the Web 2.0 Workgroup blogs. See his full detailed explanation here. It’s a simple but powerful use of OPML and a search engine to gauge popula...
Alex Barnett on 11/26/2005
manual trackback: http://blogs.msdn.com/alexbarn/archive/2005/11/25/497087.aspx and http://blogs.msdn.com/alexbarn/archive/2005/11/26/497133.aspx
Kimmo. » Hackeja ja hypeä on 11/27/2005
[...] OPML Sampling: Build a Page Showing the “Best” Item From Each RSS Feed [via] Jätä kommentti [...]
Library clips :: OPML Sampler: popular posts within an OPML :: November :: 2005 on 11/27/2005
[...] CrunchNotes points us to a post by J Wynia which describes how to extract the most popular posts from an OPML. Here is a sample from the Web 2.0 Workgroup’s OPML. [...]
FZ Blogs » Yuvarlak Hatlar Teorisi, Ruby, Java, Social Network 3.0, RSS Hacks, Technocrati Hacks, Complexity, vs. on 11/28/2005
[...] OPML Sampling: Build a Page Showing the “Best” Item From Each RSS Feed, [...]
RSS BLOGGER on 11/29/2005
Best-of-OPML und RSS Wenn Sie den RSS-Blogger schon ein Weilchen lesen, dann wissen Sie ja auch was OPML Files sind. Eigentlich viel zu Schade um nur RSS-Feeds damit aufzulisten, aber da können Sie bei Dave Winer oder dem OPMLManager mehr erfahren, was noch so alles geht ...
Billy on 1/12/2006
I have plugged in my appid, the opml file's URL, and uploaded all 3 files to the root of my domain. What do I need to do to see the output? Sorry for bothering, thx for contributing.
J Wynia on 1/16/2006
Have you opened the URL to the PHP file in your browser?
Billy on 1/17/2006
I prolly did something wrong. The XML.php and the Yahoo.php return blank pages. The index.php returns your banner up top and that is it. http://billy-girlardo.com/OPMLSampler/index.php
Billy on 1/17/2006
I think my whole problem is because Magpie is not extracting correctly for me and what does extract, my host isn't accepting all of. On top of that, I'm not sure where to put it anyway. I was just putting the whole DIR into my domain root..., sorry to involve you about my problems. I'm sure your script is great. Maybe when I get out of kindegarden lol.
J Wynia on 1/17/2006
That would definitely do it. When combined with the default error_reporting of most hosted sites (hide all errors), you get nothing but white pages. This site sits on a dedicated server, I have complete control, and I don't always know what settings will work on virtual host setups. Good luck.
tech.kynikeren » Om informasjon og blogging on 1/29/2006
[...] Løsningen ble å la den enkelte være sin egen herre over hvilke blogger/informasjonstilbydere man vil ha informasjon fra, altså er personlig utgave av nevnte ping-siter. I første omgang, og på første nivå, gjøres dette ved å abonnere på de {rss,atom}-feeder man finner interessante; i neste omgang, og neste nivå, muligheten til filtrere postene ut i fra relevans, popularitet og etter bestemte kriterier. (J Wynia har laget en fascinerende prototype i så måte.) [...]
Magpie Blog » Blog Archive » OPML Sampling on 2/6/2006
[...] J Wynia shows how to capture the gestalt of an OPML file using Yahoo APIs, and Magpie with OPML sampling [...]
Mark Woodward on 2/22/2006
I have the same problem as Billy...I get your header just fine, then lots of whitespace. I am (relatively) sure that the path to Magpie is correct, it looks like /public_html/feed2js/magpie, but wonder if the public_html is a symbolic link? Mark
J Wynia on 2/22/2006
Most "public_html" directories *are* symlinks and not real directories. Usually, if you crank the error_reporting level to E_ALL, and include something that won't work, you'll get to see the real path. For instance, on this server, there are actually something like 3 symlinks to get to the real path, including an arbitrary "site28" type structure that you just have to figure out, unfortunately.
Mark Woodward on 2/22/2006
Thanks, I'm in the process of tracking this down now...I appreciate the help.
Ole Christian Enger » Om informasjon og blogging on 3/18/2006
[...] Løsningen ble å la den enkelte være sin egen herre over hvilke blogger/informasjonstilbydere man vil ha informasjon fra, altså er personlig utgave av nevnte ping-siter. I første omgang, og på første nivå, gjøres dette ved å abonnere på de {rss,atom}-feeder man finner interessante; i neste omgang, og neste nivå, muligheten til filtrere postene ut i fra relevans, popularitet og etter bestemte kriterier. (J Wynia har laget en fascinerende prototype i så måte.) [...]
Kynikeren » Blog Archive » Om informasjon og blogging on 3/18/2006
[...] Løsningen ble å la den enkelte være sin egen herre over hvilke blogger/informasjonstilbydere man vil ha informasjon fra, altså er personlig utgave av nevnte ping-siter. I første omgang, og på første nivå, gjøres dette ved å abonnere på de {rss,atom}-feeder man finner interessante; i neste omgang, og neste nivå, muligheten til filtrere postene ut i fra relevans, popularitet og etter bestemte kriterier. (J Wynia har laget en fascinerende prototype i så måte.) [...]
Ben on 5/4/2007
Thanks for this solution J.
pat on 3/25/2007
nice work. I'll try it
Roma on 9/20/2007
I would like to put this script on my blog, but magpie is not in the zip file, where can I get it ? Thank you.
blog comments powered by Disqus
Or, browse the archives.
© 2003- 2014 J Wynia. Very Few Rights Reserved. This article is licensed under the terms of the Creative Commons Attribution License. Quoted content or content included from others is not subject to that license and defaults to normal copyright.