Setting Up an OpenID with PHP

Jan
15
2007

One of the best, simple things I ever did online was to register wynia.org as a domain name. Ever since, I haven't needed to worry about *which* email address I used on that site when I signed up. I don't have to have some other company's name as part of my web URL on my resume, and, if you can remember my name, you can find my site and get me email.

Given the pain that went away with having a permanent "home" online, I've been eyeing the OpenID activities with keen interest. While I'm semi-OK with site-specific logins for things like banking and other sensitive data, I'm thoroughly sick of having to create a login for every single forum, discussion group, etc. that I want to participate in or non-critical site I want to use.

OpenID gives you a single URL that you use as your identity on other sites. It's secured with a password. The site you're wanting to use just has you enter your OpenID and they use your OpenID server to authenticate. Next thing you know, you're using the site as "you" without a new account somewhere.

If you're looking to just have an OpenID for yourself and maybe one or 2 other people, phpMyID is a nice, simple solution. I've had installing it on my TODO list for quite a while and finally got around to it last night.

It comes with a core class that you include into each identity that you want the server to handle. The identities are each in their own PHP file, named whatever you want. If you're only going to serve up one OpenID (like I am for right now), you can name it openid.php. Anyone else who wants to use my server will get a similar file, named username.php, etc. I named mine this way to have an easy to remember and nice-looking URL.

At any rate, in the openid.php file, there are a few things you need to change, including the username and the password. However, rather than just storing the password in the file, you actually store an md5() hash that you send in instead.

To generate the appropriate string to use as a password, you need to md5() a combination of things, strung together with colons between them. I just made a one line PHP script that spits out the appropriate value to put into the script.


print(md5("username:phpMyID:yourpassword"));

I put the result of that in as the "auth_password" and set the rest of the stuff to fairly obvious settings. I then put the openid.php file on my personal subdomain: j.wynia.org. So, my new OpenID URL is: http://j.wynia.org/openid.php.

Once you think you've got your OpenID set up properly, you can test it out via the test script at OpenIDEnabled.com. I just gave it the OpenID URL above and logged in without any glitches.

Overall, I like this approach a LOT better than the more centralized models for single account signon because I can control it. It's not going to go away because some startup decided to spend all of their money on massages for employees' pets and went bankrupt.

I still need to integrate the autodiscovery into the site properly, but the test login works, so I'm happy.

Bayesian Filtering of RSS Feeds with POPFile

May
09
2006
POPFile Logo

For a while I've been wanting to try using POPFile to filter my feeds. I already have been using POPFile to handle my spam filtering and getting 99.99% accuracy. POPFile ranked high on my list of solutions because it handles an arbitrary number of buckets out of the box. Most people only know about Bayesian text analysis in a spam filtering capacity, but it's perfectly capable of arbitrary classifications.

That said, I'm only using 2 buckets: interesting and uninteresting. To apply POPFile to RSS feed reading was much simpler for me than for many others. As you may already know, I read all of my feeds as emails delivered to a dedicated IMAP account via Outlook and Thunderbird. As such, by the time I read feed items, they *are* emails and all email tools can be applied, including POPFile.

My intent to try this out came when I saw a new IMAP module in POPFile. Normally POPFile works on POP3 email accounts, but this opens other avenues.

I'm also intrigued by the possibility of using the XMLRPC module to do direct RSS to POPFile communication and classification.

Anyway, after approx 24 hours, it's processed 1200 RSS feed items with a 70% accuracy. I hand checked all of the classifications and re-trained it on the 30% it got "wrong".

If the accuracy gets closer to 99% (which it has for my spam in the email arena), this will cut the effort to keep up with this many feeds dramatically: a capacity I'll surely use to just track more stuff.

The current ratio points to about 33% of the items that come through as being worth at least a quick read. That's my criteria for "interesting". If I'd have opened the item and read the first couple of sentences, I consider it "interesting".

One more idea I've got is to put up a public feed database and run all of those items through my personal filter. Other people could add feeds to the pile and it would spit out a digital version of my opinion. Also possible would be a Bayesian Digg.com where the stories are picked from the giant feed database based on thumbs up/thumbs down voting on stories.

There's some real potential in neural net, statistical analysis and company for filtering and selecting content and I can't wait to see what comes of it.

You can see the resulting selected items, filtered by POPFile and then by me (to see anything I kept for one reason or another) at:

http://www.wynia.org/saved_items.html

Apparently I *Do* Subscribe to a Lot of Feeds

May
07
2006

Today I decided to upload my OPML feed subscriptions to the new Share Your OPML service. Apparently there are only 4 other people (at the time of this writing) using it who have more feeds in their list than I do.

I'll admit it. I'm completely sold on RSS as a way of reading great information. There's no way I'd keep up on as many topics as I do with some sort of bookmark system. Drop every new article into an email account with nice sorting features, etc. and it's MUCH easier to manage.

I fully expect that the number (currently 457) to only grow over the next year or 2. I do want some better tools to filter, but that's all I really want to keep scaling it. For instance, I read Gizmodo and a couple of other gadget sites. However, I really don't care about iPods and cellphones. I just don't.

I want to pull that part of those feeds out. It seems when someone gets too many feeds, they only really consider completely dropping feeds out of their subscription. The reality is that every feed has a certain percentage of items that we want to see. For some, it might be in the 90-100% range, but not for all. I know there are some that are closer to 5% in my list, but that 5% is worth its weight in ethereal gold.

I'm going to be putting POPFile IMAP into play to see if I can't get that working better for me. I guess we'll see. And, with the comparison features to help people find more feeds, I'm likely to need that extra filtering sooner rather than later.

Yahoo Music Unlimited, Recommendation Engines and Serendipity

Apr
11
2006

Last night, I plunked down the plastic to get a Yahoo Music Unlimited To Go (would you like a non-fat soy latte with no foam and a double shot with that?) month-to-month subscription. I hadn't done it before, but there was a recent update to my mp3 player that made it compatible with the service.

Basically, for the price of a single CD every month, you have access to the million or so songs that they have in their library on up to 3 PC's and your portable device. For once, the terms are actually an exact match to how I listen to music. Any music I listen to that wasn't over the air radio is coming from either my laptop or my mp3 player.

After a quick perusal to ensure that it wasn't just 1 million remixes of Eminem and The Black Eyed Peas, I gave it a shot. While the software is disappointing, I'm beyond being surprised by audio software that is hard to use. If it wasn't for audio sync packages and scanner software, I wouldn't have a bottom to my usability calibration scale.

Digressions aside, I managed to pull down a few albums and get them onto the mp3 player, thus accomplishing the overall goal. This morning, I've been exploring the other bits of the Yahoo setup, which include a recommendation engine and thus the point of this post.

Recommendation engines are the most obvious use of recorded attention data. When you talk to some folks, you'd think that they're the only use, but it's clear that, at the very least, this space is how we're going to explore the tools for applying recorded data. Amazon really has pushed this along, as have services like AmigoFish, Last.FM and, to a lesser degree, Netflix have all been doing this kind of thing, though predominantly by asking you to rate items more than watching what you do.

That's understandable as the mechanisms aren't really mature for watching what DVD's you pull off the shelf and put into the player. At the moment it's easier (for the provider of the recommendations) to just ask people to do the tracking manually. Eventually, that will have to go away. When it does, the whole thing will explode in usefullness. Until then, they're still interesting.

At any rate, what I wanted to talk about with recommendation engines is the idea of serendipity as a variable in the equation. Oftentimes, when I talk about having the computer watch your actions and figure you out, people have a visceral reaction. They are often assuming that the default implementation is for the computer to take complete action: delete files directly, buy music for you, open programs for you, etc. And, were those kinds of things part of an implementation, you'd have every right to fear how it could run unchecked.

However, really good implementations will act more like a personal chef, personal maid, personal assistant, etc. than a parent. When you're a little kid, you have someone cook for you, drive you around, clean the bathroom you used, and otherwise took care of these kinds of things. However, they did so on their terms. You didn't have much input into menu selection (I know I didn't get pizza and ice cream constantly) or whether they decided to run by Target on the way home from school. And, when most of us were old enough, we left that behind, because we don't want someone else running the show. As adults, we have a pretty strong reaction to someone trying to be our parents.

If you had a personal chef, after they'd cooked for you for a while, they'd probably just start suggesting meals that they know you like. Really good ones would introduce a bit of serendipity, which is a critical variable in recommendation engines. They'd, for instance, say, because you like pizza, you might like a calzone. Now, depending on your level of comfort and how much serendipity you like, they might only suggest moving from canadian bacon pizza to a canadian bacon calzone (small level of serendipity) or might suggest moving from canadian bacon pizza to a Thai peanut calzone (large level of serendipity).

Really good systems will let you adjust the level of serendipity (and REALLY good ones will figure it out on their own) you're comfortable with. Yahoo
's music engine lets you do this (to a point), by toggling a few options to favor positively rated items, etc. This is gives you the serendipity factor. If you tighten it down, you'll only get songs from your favorite artists and that's it. Only songs you've already told it you like.

This is similar to listening to a classic rock or oldies station. You know all of the songs they'll be playing and there won't be anything new (by definition). If you allow a bit of serendipity, you'll start getting a new song here and there, much like stations that play "the best of yesterday plus the hits". And, if you open it up completely to serendipity, it's like listening only to the stack of CD's your music nut of a friend lends you, full of artists you've never heard of.

Any recommendation engine needs to have a level of serendipity (or you'll just get the stuff you've already seen) and Yahoo's is doing some of the right things to let you control it. From what I see so far, it (along with Last.fm) will be a central part in me finding new music without spending much time doing so.

And, if you're looking to figure out if there might be a service you like better than Yahoo's, check out TechCrunch's comparison of the services, which just came across my desk this morning.

Beyond Folksonomies SXSW Panel Podcast Files

Mar
30
2006

Looks like the folks at SXSW have put out a complete schedule of when the podcast/mp3 versions of the panel presentations will be available. If you check out theSXSWi panel podcast schedule, you'll see that my panel's audio will be available on April 6. If you didn't attend the panel or are just curious, that will be when you can participate after the fact.

It's also an interesting schedule to look at because it will let you know when any of the other panels you were interested in or missed (which is all of them if you didn't attend).

If you are lazy and just want to grab all of the MP3's as they come out, you should put the podcast RSS url into a podcast receiver like Juice and let it take care of grabbing them as they come out and you can filter through them after the fact (that's what I'm doing).

While I agree that much of the benefit in attending is in meeting people, etc. I am not anywhere near convinced that that is the primary or even the majority of the benefit. Personally, to me, experiences like this are valuable in how they provide a wide array of input into your thinking. With 3-4 sessions per day, a couple of meals per day and your evening leisure, the permutations are gigantic. Each person who attends actually attends a completely unique conference. And, because it's 4 days of constant input, you tend to think about things on day 2 that wouldn't have occurred except for what the events of day 1 stimulated you into thinking.

Thus, I think the primary purpose of these conferences is mental stimulation. And, that stimulation *can* happen using just the podcasts. So, if you can't go to conferences, take advantage of these files and don't let people tell you that they aren't valuable.

« Older Entries  

J Wynia

For better or worse, I'm the guy who runs things here. I'm a web consultant, software developer, writer and geek from Minneapolis, MN. This site is a fairly wide cross-section of the things I'm interested in and enjoy writing about.

Oh, and if you happen to be looking for hosting for your Subversion repositories or just web hosting in general, take a look at Dreamhost. It's what I use for Subversion and your signup helps me out.

Latest Microposts

Follow Microposts on Twitter | Subscribe to Microposts

My Attendance At the Gym

Feeds and Links


www.flickr.com
This is a Flickr badge showing public photos from J Wynia. Make your own badge here.

Search


Pages

Archives

Computers Blog Directory
© 2003-2008 J Wynia. All original content is licensed under the terms of the Creative Commons Attribution license unless otherwise noted. Content from other sources is licensed under its original terms.