Archiving Full-Text Copies of Your Bookmarks

Mar
07
2007

I bookmark a LOT of stuff. Much of that is stuff that I know I'm going to want to find again. Tools like del.icio.us have made this quite a bit easier. However, in order for delicious to be of use, you usually have to augment it by adding relevant tags and a description. Even then, 6 months later when I actually want the information, I may be looking for it under a different context than when I originally filed it.

Given that I don't really want to do much more than hit a keystroke to indicate that I want to keep the page for further reference, the "normal" way of using delicious pretty much bugs me. And, unless I use it that way, it becomes difficult to retrieve anything from. What I've *really* wanted for a while is to just store a copy of each page I bookmark and then have full-text searching of that content.

In between the pain-killer fog, I thought of a quick way to accomplish this and tackled the problem with a short PHP script.

So, I took the basic code I wrote for doing a monthly delicious summary and modified it a bit. It still builds an HTML document listing the bookmarks for a given day. However, this whole setup is based off of a directory structure like this:

2007
     01
     02
     03
          01
          02
          03
          04
          05
          06
          07

Basically, a directory for each year, with months and days inside. In each day's directory is the HTML summary as well as the copies of the bookmarked pages, done via "wget". This script uses the delicious API again to get the bookmarks. The script is intended to be called via the CLI php and not run through the browser.

php bydate.php 2007-03-07

The date argument can be anything that PHP handles via strtotime(), which means that:

php bydate yesterday

will work pretty much like you'd expect. That's what I'm intending to use in a daily cron job. That way, every day, the server will just grab and archive my daily bookmarks for me. I'm intending to put this under a full-text search like Lucene or Beagle which will give me the better kind of retrieval that I'm after.

I'm still going to have to go back through the previous days that I've been using delicious, but that's just a matter of scripting my way back through the days and filling the archives. But, this will work going forward.

To use the script, you need to change the delicious username and password as well as the path to the archive. It's repeated through the script, but hey, I'm recovering from surgery, so that's as much as I'm willing to do to clean it up.

The PHP script itself is available as a highlighted PDF as usual.

Now, I think it's time for a nap.

 

Comments on this post

Feedback is always welcome. Read some from other folks or leave your own below. Just keep things civil and remember that what you post lives on in public. Forever.

Thanks,
J

Leave Your Own Comment

By submitting a comment, you agree to license it under the terms of the Creative Commons Attribution license.

People who post comments get the added benefit of visiting the site without advertising.

© 2003-2010 J Wynia. All original content is licensed under the terms of the Creative Commons Attribution license unless otherwise noted. Content from other sources is licensed under its original terms.