Folksonomies and Getting Beyond Them: Organizing my SXSW Thoughts
This post is an attempt to organize my thoughts for the Beyond Folksonomies panel on Saturday at SXSW.
To understand folksonomies, we actually have to create one. The whole topic only makes sense if we understand what is meant by "folksonomy". The term comes from "folk" and "taxonomy". Taxonomies are ways of naming things and folk are people, making folksonomies the people's way of naming things.
The term is relatively new and has cropped up to describe the "tagging" of pages, photos, videos, audio, etc. with keywords that describe the content. So, for instance, this photo that I have on Flickr:

is tagged with words like: minneapolis, sunrise, stoplight, skyline, cold, traffic, minnesota, and bus. Those were the words that *to me* described the photo, several of which require knowing where I took it. To someone else, other words like: city, urban, stopped, etc. might make more sense and would be the words they would choose.
This is where both the power and the difficulty of folksonomies sits. However, the problem isn't exactly new.
Folksonomies are actually the natural state of humanity. As long as we've had communication skills, people have been making folksonomies. They form the very structure of languages. As words are needed to describe things, actions, attributes, etc. they are created and shared. Within a community, a common vocabulary is generally agreed upon pretty quickly.
This all worked out pretty well for several thousand years. Since there wasn't much individual mobility, the communities and their folksonomies/languages covered all of the people who might need to know. And, those who did travel widely picked up several and got along pretty well.
Along the way, several people started talking about how great it would be for things like biology if, when a German and a Brit were talking about their dogs, instead of the confusion caused by calling them "dog" or "Hund", they could have a single term used by all scientists: canis familiaris.
In other arenas, the same thing happened. Writers in English used to spell things however they wanted to. Chaucer, Shakespeare and others spelled things multiple ways, made up words, etc. However, the grammarians and authors of dictionaries had a similar thought as the biologists and thought it might be great if everyone who spoke a given language (like English or French), could all have a common understanding of the way each word should be spelled and what it means.
Over time, these centralized taxonomies proved their worth and flourished. With structured organizations behind them, they were and are maintained, governed and kept neat and tidy. Today, these are all over the place and we consider them part of the "way things are".
However, they only really exist where a centralized organization controls things. If you should wish to name a new species of animal, you have to follow the rules and conventions that have been laid out before. And, if your species isn't actually new, you'll get the biologist smackdown.
Therein lies part of the problem with centralized taxonomies. Because you can only have one when you've got a "central", they don't work very well when either the rules can't easily be defined (like my photo) or when there isn't a big motive for there to *be* a "central".
Even if there were a structured naming convention for photos, I'd be unlikely to follow it for tagging photos like that one. It wouldn't be worth it to me to learn the rules and follow them on every photo.
Which is how folksonomies made their resurgence. Tagging is easy. Single words attached to URL's, photos, etc. with no structure to speak of.
Suddenly, lots of people were enjoying the freedom that comes with naming things however you feel like. Instead of worrying about whether you've used the right term for something, you just tag it with the word you use in your head.
The barrier of entry is so low for creating folksonomies, making them quick and easy to make for yourself or whatever community you belong to. Beyond that, there's a personal incentive to make it easier to retrieve your photos, bookmarks, etc. Those 2 things combined have led to an explosion in folksonomy creation.
Unfortunately, once a folksonomy reaches a certain size or gets exposure beyong a small community, the growing pains start to show up. You use a different term this week for something than you did 3 weeks ago. When searching for photos, you can't remember if you tagged those photos with "dog" or "dogs". This doesn't even get into whether it should be "dog" or "Hund".
So, there are growing pains. What are we to do? Go back to centralized taxonomies? I've seen more than a few articles basically suggesting that. They put forth rules and guidelines for how to manage your tags, suggested structures, etc. These are doomed to failure as the people who drove much of the folksonomy explosion (and there's lots more still to come as the general population discovers the concept) did it ONLY because they didn't have to learn or keep track of the rules.
However, we still need a way to manage the ever growing mountain of data. And, if the search engine market, with structured directories (DMOZ and early Yahoo), brute force engines (Altavista) and even pretty good searches like Google (which only has one number one result for "Java". Is it the island or the programming language?) have shown us anything, it's that no one approach by itself solves the information mountain problem.
Attempting to create yet another directory of items, each in neat categories just doesn't scale. And, reducing the entire mountain to a single dimension doesn't work either.
So, where do the solutions lie?
To me, the key is in how we, as humans, have handled it for millenia. We make connections between the terms that make up our overlapping folksonomies. We know, as Americans, that when a Brit says "lift" that the term matches up to "elevator". We make the connection that something tagged as "dogs" is the same thing as something tagged as "dog". We get this stuff because the human brain is great at making connections.
Well, computers are getting pretty good at this too. Especially when we train them. Show a computer running an appropriate algorithm a whole pile of tags, grouped together how users entered them, and the computer can ferret out the patterns.
All of this is why I'm interested in attention recording as well as feed subscription, spam filtering algorithms, etc. It's because I think that the problems that folksonomies solve: finding new content, describing content, retrieving content and sharing content can be expanded and scaled through a multi-pronged approach.
- Capture data while letting people do things comfortably. This means low effort things like attention recording, no rules on tagging, "no" effort bookmarking, etc.
- Create multi-dimensional results for searches. Things like "here are the top 5 results for the term you searched for, but here are 3 related terms you've tagged these next 5 things with".
- Analyze captured data for relationships and store those relationships for use throughout.
- Provide easy feedback mechanisms on "found" content you provide out of such a system. If a recommendation is made for a given feed and I don't like it, the ability to downgrade the recommendation needs to be effortless.
- Build the backend for the geeks (with access to data and APIs) and the front end for regular people (zero effort and ease of use). The geeks will build the interface they want anyway.
- Leverage all of the non-folksonomy methods for storing, analyzing and presenting information. Scalable folksonomies will NOT be possible if you shun the old ways entirely. Fulltext search solves some problems in very useful ways. Outlines are very useful ways of *presenting* data.
- Learn to live with microslop in exchange for macroprecision. Any one person's folksonomy is likely to be full of typos and errors. However, on the whole, there is a giant leap forward in data that can be retrieved this way.
Overall, the above is kind of an overview of my thoughts on folksonomies. It's an ongoing and evolving view that I change as more information comes in. But, many of the key bits stay the same and are emerging as the primary drivers of my take on this topic.
For those of you who will be at SXSW 2006, you can drop in on our panel on Saturday morning (first panel session) and the rest of you can watch BeyondFolksonomies.com for more info on this topic.
