The Power of the Proof of Concept

Feb
17
2008

If you were to go through my Visual Studio "Projects" directory or my personal development web server, you'll find about 2/3 of the directories are named SomethingPOC or SomethingExperiment. The former has become my convention over the last year or so and stands for Proof of Concept.

I spend a good portion of each day with one or more of these projects open. That's because I consider the use of the Proof of Concept to be integral to software development. Whether the official methodology of the project encourages them, is indifferent or actively discourages them (and I've worked in all of those environments), I will insist on using them.

First, a quick explanation of what exactly I mean by a POC. Basically, it's the simplest possible program that will answer a question that you have about the tasks in front of you.

Say, for instance, that you wanted to retrieve an RSS feed and store the individual entries into a SQL Server database. I'd probably do a quick POC to connect to the database and insert a record. Then I'd do one for fetching an RSS feed. I might also do one that checks a feed for new items vs one's seen before.

In other words, each POC tests out one concept and proves that you can do what was a question mark in your project approach. If you end up with more than one method in a POC, you're probably doing too much and it should be broken down into more than one POC.

I deliberately name the projects and classes with names that can NOT work in the final project. This helps to hedge against the inclination to do a quick copy and paste into your real project. This is important because POC's should be quick and loose. They don't have error handling, don't do validation (unless that's what you're proving) and generally don't follow many of the rules of good software development. That's a good thing.

That freedom means you can quickly explore the problem and work through some possible solutions in a "sandbox" without worrying about whether you're doing it "right". However, it's also a good thing that you throw the POC away or only use it as a reference.

If you do lots of POC's, name them to encourage disposable coding and then move on to do your "real" development, you'll find that you have often left those crappy early mistakes in the POC, have already run into and overcome many of the typical problems you run into in new solutions.

Once you're into your "real" development, lots of people abandon POC's. However, I keep using them throughout the project (even into the bugfixing and testing phases). Every time I'm asking myself a question about whether an idea will work, rather than trying the experimental code in the permanent code.

Over time, this ends up being a constant cycle. You ask yourself a question that can only be answered with code, do a POC to come up with an answer and then move back to the full project to implement it. If you aren't used to this kind of cycle, I'd recommend giving it a shot. I won't work without it.

Using HTMLTidy to Clean Up HTML with C#

Feb
14
2008

For a while now, I've had a project on the back burner for a different set of tools for RSS reading, writing and publishing. I'd like a single toolchain that lets me keep everything together in one place. I've got piles of notes, a few proof of concept projects and the start of several of the components.

Last night, when I couldn't sleep, I decided to check something off of the list that I wanted to see as a proof of concept for the Atom Publishing Client part of the toolchain: HTML Tidy cleanup to XHTML of HTML before putting it into an Atom entry document.

I currently do most of my writing for this site in Windows Live Writer. However, that's more of a compromise than an ideal choice. While I could probably hack a plugin together that would make Live Writer a more suitable long-term choice, I really want a very specific set of features that includes getting away from the XML-RPC API that all of the server-side engines Live Writer works with are based on.

So, I've been tinkering with a multi-tabbed Windows app for editing posts. The WYSIWYG tab for quick editing uses the MSHTML engine from Internet Explorer. I've looked around and unless you're willing to pony up $299 for a commercial control, that's the most reasonable choice.

However, the HTML that MSHTML spits out is horrible and really needs to be cleaned up. So, I set out to figure out how to use HTMLTidy in a C# project.

I tried to find a .NET wrapper for HTMLTidy and thought I had scored right away when I found one here. However, when I tried to use it, not even the sample code would build without errors on my development machine.

So, I dropped back to trying the COM object version. The last update to it was back in 2000 or so, but it looked like all of the features I needed were in that version, so I decided to give it a shot.

To use the TidyCOM library, you add it as a reference and insert your "using TidyCOM" statement in your class. The actual usage is fairly straightforward.

Example:

TidyObject TidyObj = new TidyObject();
TidyObj.Options.Doctype = "strict";
TidyObj.Options.DropFontTags = true;
TidyObj.Options.OutputXhtml = true;
TidyObj.Options.Indent = TidyCOM.IndentScheme.AutoIndent;
TidyObj.Options.TabSize = 2;
String CleanHTML = TidyObj.TidyMemToMem(HTML);

That code assumes that the "HTML" variable has your messy HTML in it and at the end, "CleanHTML" has your cleaned up XHTML in it.

My little multi-tabbed prototype is using a buffer object to keep the "current" HTML in it. Whenever you switch tabs, the old content is scrubbed through this code before the new tab gets updated out of the buffer. That means that whether it's the WYSIWYG tab that messes it up or you in the HTML editor, you still get valid XHTML in the eventual output.

I also extended my CleanupHTML method (that contains the above code) to scrub out the HTML header tags, body tags, etc. Since the HTML will actually end up as one part of the Atom xml file and not as a standalone HTML file, I only want the content from the editor and both MSHTML and HTML Tidy will always put that stuff back in unless you strip it out.

While I'd still like an assembly that's a little more current, this clearly does the job well enough to check this feature off of my checklist. Now, on to RESTful services on IIS with C#.

C# DataSets and the Magic of ReadXML

Jan
31
2008

I've worked on several applications where we used .NET DataSets as the container for passing records between web services and other components. They work pretty well to keep things nice and loosely coupled when you're building lots of separate components that may or may not all be using the same language, etc.

One of the greatest things that they included in the DataSet classes is the ability to read and write them to XML files. That gives you not only an interchange format, but a file-based version of it pretty quickly. You can easily use those files as your "gold standard" for building all of the components at once. As long as each component emits and consumes that sample file, things are golden.

Anyway, one of the side benefits of that ability to read/write those XML files is that it not only handles the DataSets you create via code. The ReadXml() method actually will convert nearly any XML file into a DataSet. That can come in really handy when your entire application is already passing DataSets around.

That's because nearly any application of reasonable size pulls in information from somewhere outside of the control of your code. In many of those cases, that data will be in XML format. You can, therefore, use the ReadXml() to get DataTable access to all kinds of useful XML stuff.

When it gets read in, .NET does some pretty cool automatic stuff, like creating identifier columns on your tables, etc. However, if, unlike the "normal" DataSets, your imported XML data is nested 2-3 or more levels deep, it can be kind of hard to predict exactly what the DataTable structure will look like.

I'm not a huge fan of automatic or "magic" methods, because you usually have absolutely no way to see inside the black box. That's not the case here because, while the method does some pretty cool magic, it is still possible to see inside of what it does.

I decided that I needed something to deal with the black box today and after dinner tonight, I wrote a quick console app to take an arbitrary XML file and dump out all of the tables, columns and rows in the DataSet in a way that makes it more clear how you'll need to use the tables to grab the data you're after.

That's the information you're going to need to establish your DataRelation objects to tie things together. It's been fairly illuminating for the few files I've sent through it so far and I'm thinking this will be a permanent part of my utilty folder.

I run it using Powershell and the "Out-File" pipe the output to a file, giving me a record of that schema (which I find much easier to read than the output of the WriteXmlSchema() method).

In case you'd like to use it as well, here's a copy of the code.

HTML as Page Layout Language

Dec
28
2007

Off and on over the last 6-8 months, I've been working on a project that needs PDF as its final output format. The plan has been to use DocBook and the toolchain attached to it. However, that's been more frustrating than it first looked when it comes to integrating into the whole system I'm designing.

Then, earlier today, someone posted a link to this YouTube video, which demo's the functionality of the Prince engine. That revealed a system for really nice page layout using HTML and CSS (with CSS3 handling the page breaks and other stuff like it was designed to, making Prince the only implementation of CSS3 out there that works as far as I know).

Given how my project is web-based, being able to just keep it all HTML from end to end and still get really nice PDF's out the other end would be a huge benefit. And, given how this project will be commercial and how much time I've already spent trying to do all of the conversions back and forth, even the steep pricetag for a server license will likely be a net bargain.

Fortunately, the version that puts a little logo in the top, right corner of the PDF (only for display, not printing) is free for development/personal use. So, I messed around with that a bit tonight and got a feel for it. There are versions for pretty much all of the platforms (Windows, Mac, Linux, BSD, etc.) and integration with code for automatic generation is fairly easy.

Really basic conversion using C# only took 3 lines of code. I just grabbed the normal Windows version, also downloaded the DLL and added that DLL to a basic console app.

Then, these 3 lines work to dump out a PDF of the page in question. I just threw together a quick HTML document to test with a few H1, paragraphs, etc.

IPrince pr = new Prince(@"C:\Program Files\Prince\Engine\bin\prince.exe");
pr.AddStyleSheet(@"C:\Program Files\Prince\Engine\style\xhtml.css");
pr.Convert("demo.html", "demo.pdf");

Pretty easy startup as far as I'm concerned. The video is worth watching, despite being somewhat irritating to watch. Like many presentations to a room full of geeks, there's quite a bit of not seeing the forest for the trees. Lots of people shooting it down by saying, "this is would be REALLY great if it supported my one pet feature" kind of stuff. They got a bit hung up on those little nit-picking details and I wonder how much of their presentation ended up left out as a result.

Based on what I've seen so far, I definitely think it's worth tinkering with a bit more and doing the math on that license fee as part of my project budget.

Software Development and Alchemy

Dec
17
2007

Photo: Stian Martinsen

In several conversations recently with other software developers (yep, those are just as exciting as your wildest dreams) and their frustrations with the process, as implemented in modern corporate America, the same analogy kept popping into my head.

More and more, I feel like the things that businesses are after in their software development are similar to medieval alchemy. For 2500 years, the entire field that eventually became chemistry was obsessed with 3 basic questions:

  1. How can we change lead (or other metals) into gold?
  2. How can we create an elixir that will cure all diseases and prolong life indefinitely?
  3. Can we discover a universal solvent?

All of these strike us as goals that weren't even attainable. Yet, the underlying desires often did get met when the focus shifted to what eventually became modern chemistry. By dropping the focus on the single, universal solution and just figuring out how to treat individual diseases or how to dissolve individual compounds or just fundamentally understand chemistry, many advances did happen.

Many/most of the diseases that the alchemists sought to cure or treat are under control today. There's very little in the world of chemistry that we can't tear apart and we can do things like convert coal or corn into one of the most sought after substances on earth: liquid fuel for transportation.

One of the consulting firms I worked with had a project manager that was constantly pushing the developers to find and use "automagical" tools to build our solutions. What he was after was the kind of IDE or tool that, with a few clicks, would just spit out a nearly complete solution.

That would, of course, result in the sales force being able to sell expensive solutions that could be fulfilled in minutes instead of days and weeks. It didn't matter how often I pointed out that, as a consulting company, if our clients' solutions were so simple that a few clicks and config options could solve them, they wouldn't bother coming to us: they'd just buy the software themselves.

This same person wasn't very excited about things like loosely-coupled systems and/or Service Oriented Architecture unless they also came with wizards that let you choose 4 or 5 options and they'd just spit out a fully-realized application. Yet, those approaches keep working for me as a way of looking for patterns in companies' problems and solving them quickly and completely.

Instead of looking for the tool that spits out C#, PHP, ColdFusion and Ruby, I'm looking for repeating problems like managing queues of objects to be processed. Once you have an approach to that general problem, a good developer can probably implement it in whatever language they're most comfortable with.

That's due, in large part, to the fact that the bulk of the work as a software developer is NOT in typing in the text of the programming language in question. Douglas Crockford said in one of his Yahoo video lectures something along the lines of: a developer could probably type up all of their code for an entire year in a day or 2.

Yet, many of these automagical tools really only seem to automate the stuff related to typing code, not for solving problems. And, like I said a couple of days ago, if you're in the consulting game or just looking to stay employed as a developer, the money and jobs are where the problems are.

That's why, when I hear someone looking for that quick and easy tool that will "just" take care of it this afternoon, I tend to interpret it as, "Can't we just change this lead into gold instead of getting real gold?"

« Older Entries   Newer Entries »

J Wynia

For better or worse, I'm the guy who runs things here. I'm a web consultant, software developer, writer and geek from Minneapolis, MN. This site is a fairly wide cross-section of the things I'm interested in and enjoy writing about.

Oh, and if you happen to be looking for hosting for your Subversion repositories or just web hosting in general, take a look at Dreamhost. It's what I use for Subversion and your signup helps me out.

Latest Microposts

Follow Microposts on Twitter | Subscribe to Microposts

My Attendance At the Gym

Feeds and Links


www.flickr.com
This is a Flickr badge showing public photos from J Wynia. Make your own badge here.

Search


Pages

Archives

Computers Blog Directory
© 2003-2008 J Wynia. All original content is licensed under the terms of the Creative Commons Attribution license unless otherwise noted. Content from other sources is licensed under its original terms.