OPML to YAML with XSLT for Symfony

Jan
02
2007

You're going to have to get your own crackers to accompany the alphabet soup required to accomplish the task at hand: OPML to YAML with XSLT.

Yesterday, Aaron and I used the holiday to do some work on our relative symfony projects. Mine (the RSS "desktop" I'm working on, called Synfed) calls for either:

  1. me typing in the URLs, titles and descriptions to tons of feeds.
  2. pulling in that same information from OPML files.

Being fundamentally lazy when faced with menial data entry, I opted for Plan B. Since I know I'm going to be re-generating the initial feed tables for this app dozens if not hundreds of times before it stabilizes, I wanted to have a quick import for symfony to chew on. That meant a YAML file.

YAML is a pretty quick-n-dirty format that beats XML for these kinds of tasks (particularly if Plan A is your only real option. Hand-typing XML sucks.). However, XML as a source format works wonderfully if you want to go to pretty much anything else based on ASCII since that's what XSLT was *made* for.

XSLT, among those who only have a passing knowledge of it, gets painted as being about XML to HTML conversions. But, it's real intent and purpose was XML to other XML or XML to any ASCII formatted file. Got 2 XML files that have similar data, but not similar structure? Use XSLT to homogenize them.

Given the simplicity of the OPML format, I figured that this would be an easy job for XSLT. It was.

A few lines of XSLT later and I had my result. There were only a couple of tricky parts. First was that symfony's data imports need each "record" to have a unique id. Since the data itself is a bit "iffy" for validity as a recordname, I just used the XSLT function to generate unique ids for each feed entry in the OPML. Second was the OPML file itself. The first one I did was my own subscriptions. However, it appears that lots of folks include special characters, smart quotes, line breaks, ampersands (the kryptonite of XML) etc. in their title AND their descriptions. I just cleaned that stuff up manually (though I'd prefer if OPML exports did that to ensure valid XML coming out) until the parser was OK with it. And, the last trick was whitespace. Because YAML is based off of indentation and new lines, I had to tweak it a bit in some strange ways to get the output right.

However, it only took a few passes to get it to look right. You can see my XSLT (which is tied to my schema) which should give you an idea of how it works. Given the amount of data out there in XML format, this should prove useful for anyone doing work in symfony.

I did the actual transformation using the Microsoft XSL commandline tool, which I've used for as long as I've been doing XSLT. The commandline to run it through and get your YAML file is:

msxsl your.opml opml_to_yaml.xslt -o your.yaml

If you omit the "-o outputfile" on the end, it will dump the output right to the screen. Great if you're trying to recreate the Matrix "UI". Bad if you're actually trying to read the output.

 

Comments on this post

Feedback is always welcome. Read some from other folks or leave your own below. Just keep things civil and remember that what you post lives on in public. Forever.

Thanks,
J

Leave Your Own Comment

By submitting a comment, you agree to license it under the terms of the Creative Commons Attribution license.

People who post comments get the added benefit of visiting the site without advertising.

© 2003-2009 J Wynia. All original content is licensed under the terms of the Creative Commons Attribution license unless otherwise noted. Content from other sources is licensed under its original terms.