Yahoo Keyword Extractor

Sep
07
2005

As some of the RSS aggregator research and experimentation that I'm doing, I've been looking at tons of web service API's. The Yahoo API includes the regular stuff you'd expect: search API's, etc. However, it also has something I haven't seen in any of the other API's: keyword extraction. It takes a chunk of text and will give you the significant keywords from the text. Given the popularity and utility of the social bookmarking and social tagging going on, the importance of keyword analysis hard to overestimate.

As part of my new template for this site, I looked at including the Yahoo keywords as part of the metadata for a posting. It isn't perfect by any means, but generally does a good job as a 1st pass filter to narrow things down. And, for things like search engines, it's as good as many of the keyword meta tags I see. You can see the results on my new template contstruction zone (warning, breaks on a regular basis as I work on it). Just mouseover the Yahoo keywords link at the bottom of the posting to see the keywords Yahoo chose.

Here's a quick PHP function I used to make it work. You'll need your own appid from Yahoo to use it. It's also just a quick hack and doesn't properly


function suggest_keywords($content){
$url = "http://api.search.yahoo.com/ContentAnalysisService/V1/termExtraction";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "appid=REPLACE_ME_WITH_YOURS&query=null&context=$content");
$result = curl_exec ($ch);
curl_close ($ch);
$stripped = strip_tags($result,'<result>');
$pieces = explode("</result><result>", $stripped);
foreach ($pieces as $tag){
$cleantag = strip_tags($tag);
if($output_string){
$output_string = $output_string . ", " . $cleantag;
} else {
$output_string = $cleantag;
}
}
return($output_string);
}

 

Comments on this post

Feedback is always welcome. Read some from other folks or leave your own below. Just keep things civil and remember that what you post lives on in public. Forever.

Thanks,
J

2 Responses to “Yahoo Keyword Extractor”

  1. Lucrando na Rede » Mais ML Contextual Says:

    [...] Seguindo a sugestão do Kenji dei uma pesquisada na Y! API|, e com a ajuda do próprio buscador do Yahoo! acabei encontrando o artigo Yahoo Keyword Extractor, que deu a base para a criação da nova versão alfa do script. [...]

  2. Russel Says:

    I have been using it for about a 1.5 years now and ran into a problem with it. 1) it limits an IP to 5,000 queries per day - not a problem for most sites, but for large dynamic ones like myself it is a routine problem. 2) bad error handling when it breaks/times out.

    But its pretty darn effective for high volume dynamic sites.

    Because of the 5k per day limit I researched more keyword APIs and found that MSN has one (though I haven't looked into it yet) and also found this one: http://www.alchemyapi.com/ - its free level service allows up to 10k queries per day…the paid services start at 50k per day and go up. Between yahoo and alchemy (and any other free API services) you could string together a series of API keywords services to use be used in succession when the previous one hit its max for the day…..as I am now doing. :-)

Leave Your Own Comment

By submitting a comment, you agree to license it under the terms of the Creative Commons Attribution license.

People who post comments get the added benefit of visiting the site without advertising.

© 2003-2009 J Wynia. All original content is licensed under the terms of the Creative Commons Attribution license unless otherwise noted. Content from other sources is licensed under its original terms.