Data Mining, Giant Piles of Data and Amazon

Dec
13
2005

First, I believe that the key to really good contextual filtering of content lies in a combination of the folksonomies that people are coming to know and love along with data mining huge piles of data. One of the biggest problems in trying to do experiments on that theory is that you've got to have huge piles of data to dig through. That usually means gathering it yourself, which usually means it doesn't ever get done. Enter Amazon.

Amazon is going to grant access to a bunch of their raw data for web searching. While not free, the fees are reasonable and "indexed" for future increases. For instance, the fees are "per GB", "per hour of CPU time", etc. And, because the prices are set rather than "call us", it opens up the possibilities for those of us who want to work with this, but don't have the money to, say, ask Yahoo to let us run 100,000 queries a day or 1 million instead of 5,000.

Very cool and I'm very excited. And, FYI, if the possibility of data mining huge piles of data makes you grin from ear to ear too . . . you're definitely a giant geek, like me.

 

Comments on this post

Feedback is always welcome. Read some from other folks or leave your own below. Just keep things civil and remember that what you post lives on in public. Forever.

Thanks,
J

Leave Your Own Comment

By submitting a comment, you agree to license it under the terms of the Creative Commons Attribution license.

People who post comments get the added benefit of visiting the site without advertising.

© 2003-2009 J Wynia. All original content is licensed under the terms of the Creative Commons Attribution license unless otherwise noted. Content from other sources is licensed under its original terms.