The Glass is Too Big - Home

Archiving Full-Text Copies of Your Bookmarks

Originally published on: 3/7/2007 12:11:28 PM

I bookmark a LOT of stuff. Much of that is stuff that I know I'm going to want to find again. Tools like del.icio.us have made this quite a bit easier. However, in order for delicious to be of use, you usually have to augment it by adding relevant tags and a description. Even then, 6 months later when I actually want the information, I may be looking for it under a different context than when I originally filed it.

Given that I don't really want to do much more than hit a keystroke to indicate that I want to keep the page for further reference, the "normal" way of using delicious pretty much bugs me. And, unless I use it that way, it becomes difficult to retrieve anything from. What I've *really* wanted for a while is to just store a copy of each page I bookmark and then have full-text searching of that content.

In between the pain-killer fog, I thought of a quick way to accomplish this and tackled the problem with a short PHP script.

So, I took the basic code I wrote for doing a monthly delicious summary and modified it a bit. It still builds an HTML document listing the bookmarks for a given day. However, this whole setup is based off of a directory structure like this:



2007

01

02

03

01

02

03

04

05

06

07



Basically, a directory for each year, with months and days inside. In each day's directory is the HTML summary as well as the copies of the bookmarked pages, done via "wget". This script uses the delicious API again to get the bookmarks. The script is intended to be called via the CLI php and not run through the browser.



php bydate.php 2007-03-07



The date argument can be anything that PHP handles via strtotime(), which means that:



php bydate yesterday



will work pretty much like you'd expect. That's what I'm intending to use in a daily cron job. That way, every day, the server will just grab and archive my daily bookmarks for me. I'm intending to put this under a full-text search like Lucene or Beagle which will give me the better kind of retrieval that I'm after.

I'm still going to have to go back through the previous days that I've been using delicious, but that's just a matter of scripting my way back through the days and filling the archives. But, this will work going forward.

To use the script, you need to change the delicious username and password as well as the path to the archive. It's repeated through the script, but hey, I'm recovering from surgery, so that's as much as I'm willing to do to clean it up.

The PHP script itself is available as a highlighted PDF as usual.

Now, I think it's time for a nap.

Comments

blog comments powered by Disqus
© 2003- 2010 J Wynia. Very Few Rights Reserved.