Recording Your Attention: Spying on Yourself

Originally published: 03/2006 by J Wynia

Between the spyware companies, marketing companies, survey companies, and cross-site cookies, it seems everyone wants to know all about you, what you read and what sites you visit. And, they want to sell that information to each other at an alarming pace and price. They all clearly think that the information about the sites you visit is valuable. They can glean all kinds of insights about you from it.

I've read several times that if they have age, sex and zip code, they can get really pretty accurate targeting to an individual. So, why is the information about me not easily available to me to glean insights from? Why can't I follow my own trail and look back at my journey?

Well, with AttentionTrust you can keep track of the sites you visit from all of your computers. To push it a step further, and get absolute control over your clickstream and also have easy access to deep analysis, you can set up your own attention server. Which, last night, I did.

I've been running the AttentionTrust extension with local logging and logging to the RootVaults service for a while. However, the reporting at RootVaults is pretty much limited to a list of sites you visit regularly. Beyond that, I like being not just a little bit in control of my data, but completely in control of it, which is why I am not nearly as big a fan of hosted services as some others are.

See, I want to not only know what sites I visit, but what keywords are relevant for that content, to be able to search through the full text of those pages later (you know to find that stuff you know you saw "somewhere"), as well as analyze the whole big pile to train a digital agent to help dig out interesting content. I want to ferret out how what I look at in the morning is different from what I view in the evening from what I view on client sites from what I read on the weekends. I want to know lots of things I haven't even thought of yet.

Grand visions aside, I was able to get their demo server up and running and it's tracking my path diligently. The demo server is pretty simple and I'll likely replace it with something a bit more robust and secure. Especially if you wanted to run a server for multiple users, the data shouldn't be stored in plain text. I'd prefer to see the user-supplied password as an encryption key that encrypts everything as it stores it.

That would leave the data ONLY available to the owner. The site operator wouldn't be ABLE to dig through the data. Further seperate each user's data out into a seperate SQLite file and you've got a system where, even if someone breaks into the server, the data is siloed for each user and useless without the password, which isn't stored anywhere in the system. Of course, this has the downside of all of a user's data being locked up if they lose their password. There are no "resets" with this kind of system, but absolute privacy.

At the moment, this whole path is seen as a "development" setup that will get overwritten whenever you upgrade the extension. I'm digging through the extension source to see if there's a reasonable way to modify it for more of a roll-your-own kit.

Ideally, a record-your-own-attention toolkit would be a single bundle. In that bundle would be the server PHP scripts and everything else you need to run your own attention server.

You would install the PHP files and change the permissions on a directory for write access. In that directory would go the config file as well as an SQLite database (instead of the MySQL that this has at the moment) for each user. That database can be taken whole by the user from the system (for offline analysis) or deleted if they wish it to be purged. You run a setup script that asks you to create a user, which is stored only as a valid user (no password stored) with a pointer to the appropriate database file.


The user authenticates with HTTP authentication (currently the biggest security hole as the password goes in plain text) or a better login via a website to set the user id cookie (the extension sends data just like the browser would). The supplied password isn't stored, but is used as the encryption key to store everything. Reports would use the same key to unlock data for viewing.

OK, so you don't want to wait for me or someone else to make my ideal version work? Here's how to install the demo yourself.

  1. Install the AttentionTrust plugin for Firefox. There's also a proxy server for Safari and I'd expect something soon that will hook into Squid too. However, these instructions assume Firefox.
  2. Open the Extensions section of Firefox and right-click the Attention Recorder to get the Options
  3. Set it to record to a local file and browse for a location.
  4. Restart Firefox and visit a couple of websites.
  5. Check to see that the file you set is growing in size. If you open it and see the sites you visited, you've got the extension itself working. However, as XML files grow, they get really difficult to dig through and it won't record from your other computers, so we still want to do the server thing.
  6. Grab the PHP demo toolkit and upload the unpacked contents to your PHP server.
  7. Create a MySQL database (or make sure you won't have a problem with a new table called "at_clicks" in an existing one) and load the at.sql database table into your new MySQL database. It always bothers me that this step is necessary given that the script author could just as easily make that SQL insert as part of a setup script instead of making the user install it.
  8. Edit the config.inc.php file for your database parameters and set a username and password (that you'll use as a user) in that file as well.
  9. Visit the directory you installed it on your webserver, i.e http://www.example.com/attention/
  10. You should get asked for a username and password. If not, you've got a problem and it's likely the rest won't work either.
  11. In Firefox, go to "about:config" in the address bar.
  12. Refine to the key: "attention.attentionbankSendList".
  13. That key should contain "LOCAL". Add the path to the "record.php" script in the attention script directory. It should be something similar to "http://www.example.com/attention/record.php".
  14. Browse away.

Your database will now fill up with your clickstream. After a few minutes, go back to the /attention/ URL and you should see the rudimentary report.

After a few weeks or months, you can start digging deeper through the growing pile of data. Enjoy.

Comments

Alex Barnett on 3/4/2006
Nice one! What would it take to make the db SQL Server Express Edition?
J Wynia on 3/4/2006
I believe that the toolkit is using the PEAR database abstraction layer, which I think handles ODBC just fine. The thing that I think I'm going to do next is actually do a parallel "recorder" that just dumps out what it sends. Based on that, you could write an attention recorder that works with their Firefox extension in any server-side language you want with any database you want. In the end, it's just inserting a row of information like "http" as protocol, "www.wynia.org" as domain, user id, ip address, etc. into the database. This is the same kind of stuff a web server log records, so the server just needs to authenticate and identify the user and process each basic request. If the middle (the HTTP requests themselves) are documented (and I don't think they're very complicated at all), it'd be fairly easy to write a recorder any way you want.
Alex Barnett on 3/4/2006
Cool, let me know how it goes. Am sub'd to this post's comments and your main feed. thanks. Alex.
Bozo on 3/5/2006
Name 5 specific things that make your daily life better by doing this. Name 10 things you found by doing this you would not have found in any other way (e.g., using the browser's History). I think you need a better hobby, something else to do with your time.
J Wynia on 3/5/2006
First, I find it funny that my doing it was a complete waste of time, but your commenting on it isn't and the list you want me to make wouldn't be. I'm willing to make a substantial bet that 3 years from now, no one will need to explain it to you. It will be self-evident. However, those kinds of things often don't make sense to anyone but the people working to bring them to the forefront. At this point, if you don't get it, I could probably make a list 100 items long and you still wouldn't get it. As far as my time goes, ask anyone who knows me very well and I'll stack my 168 hours of time per week up against most. Implicit in your statement is that *your* last week contains only items of greater significance than the hour I put in installing this stuff and putting the article together. Are you really going to tell me that you spent every hour of the last week in activities that were better uses of time than this (even according to your standard that it was useless, which I disagree with)? If so, I'd love to see your accounting of a full 168 hours in a week as a list. When you've done that, I'll consider some accountability for how I spend my time. And, if you're looking for why anyone might do this, you'll have to go back quite a distance. As Socrates said, "The unexamined life is not worth living."
Aaron Westre on 3/6/2006
I can account for about 90% of my time, which is less impressive when one considers that I have other people managing my time. But, given some of the tasks said people have given me recently, I'd say this was definitely not a waste of time.
Alex Barnett blog : Record-your-own-attention toolkit on 3/6/2006
[...] Record-your-own-attention toolkit Now here's an Attention data experiment. This time by J Wynia. "I’ve read several times that if they have age, sex and zip code, they can get really pretty accurate targeting to an individual. So, why is the information about me not easily available to me to glean insights from? Why can’t I follow my own trail and look back at my journey?" What he's done is very interesting. He's hooked the AttentionTrust recorder extension to connect to an instance of Root Vaults server that he's running on his own webserver. He's now recording and storing his own clickstream data in his database, which he could connect to any other service he chooses. Or not. He's documented all this so you can try too... "After a few weeks or months, you can start digging deeper through the growing pile of data. Enjoy." Why? Because it's his data. And he can. - Tags: Attention   [...]
BillyG on 3/7/2006
I hope I don't make the "I could probably make a list 100 items long and you still wouldn’t get it" list so I'll give it a shot and go from there lol. At least time I opened my mouth I finally found out that my host setup was the problem lol. Thx.
BillyBLOGirlardo · Setup Your Own Attention Server on 3/7/2006
[...] I’m gonna give Mr. Wynia’s post a careful read all the way thru first though because last time I attempted some Wynia magic, I spent a bunch of time spinning my wheels due to my host setup not obliging to his config and don’t want to go thru that again. Thanks Alex. Attention, RootVault, Transparency [...]
J Wynia on 3/7/2006
No, that was specifically aimed at the person who demanded that I:
Name 5 specific things that make your daily life better by doing this. Name 10 things you found by doing this you would not have found in any other way (e.g., using the browser’s History).
I'm always 100% OK with someone not getting the technical implementation kind of stuff. I just get a little hot under the collar when someone thinks they should be the arbiter of how I spend my time and what's worth doing. His was an unreasonable demand that I justify to him why I did it. And, I'm contrarian enough to have responded publicly.
BillyG on 3/7/2006
I'm sorry J., I wasn't clear in my wording, gets me everytime lol. I AGREE 1000% WITH YOUR RESPONSE TO HIM. I don't want anybody telling me how to spend my time either. I was just jabbing at myself for not grasping your setup on my last attempt with (??) something here. (FYI: I had used your search box but couldn't find me and couldn't find your homepage either, i.e your header doesn't link to Home etc.) Stay true. Sorry for the confusion.
elliptical . . . » Blog Archive » Recording Your Attention: Spying on Yourself– The Glass is Too Big - J Wynia on 3/19/2006
[...] Recording Your Attention: Spying on Yourself — J Wynia [...]
Christian on 3/20/2006
I will try and ask more politely what the previous poster asked. This is an honest question. What value do you see in something like this? What will tracking this information do for me?
Jon Galloway on 3/20/2006
I'm an avid follower of the attention concept, but no one is really doing anything with the attention data at this point. Hopefully we're populating the database for future applications which will mine it and actually do something useful. Last.FM is an example of using musical "clickstream" data, but I'm still waiting for the web equivalent.
J Wynia on 3/21/2006
First, I want to thank you for asking more politely. The problem with explaining it to anyone who doesn't already get it (and I tried several times at SXSW) is that in every case, people want examples. And, like Jon says, there aren't many examples of the uses on this side of the equation. This is due in large part to the fact that most of the eventual uses of this data require quite a bit of it to pile up before you can do anything useful. There ARE examples on the negative side that most people are familiar with. Think of spyware. Why would a company go out of its way to watch which websites you visit? Because it's valuable data to figure out more about you. Now, while you probably know quite a bit about yourself, there's a lot you don't know. These techniques are driving toward letting us figure those things out ourselves. Think of things like highlighting sites you've already visited (in the last 3 years) on a Google search. Or having your bookmarks sorted by how often you visit the site, not how often you click the bookmark. Amazon is another good example. If you buy enough stuff there (see previous point about enough data), their recommendations get pretty darn accurate about things you'd at least be interested in looking at. Now, think about that applied to *everything* on your computer. You're looking at file x. You open file y 85% of the time within 10 minutes of opening file x. So, why isn't a link to file y sitting handily right there? You're writing an email with 4 people in the To: field, but 60% of the time you also add person Z and person A to the CC line. Subtley hint them in. If I want to add them, it can be a single click and if I do nothing, it goes out to the 4 with no change. Every time someone sends you an email with "registration key" in the subject, you send an email that's identical to this template. Do you want to I do make a HUGE distinction between helping and doing. I don't want this stuff doing it for me (ordering products), but do want it to help me (filter out companies I have said I hate dealing with). This is also a HUGE difference from "You appear to be typing a letter" because you typed a ":" in Word. This is adaptive behavior based entirely on what you actually do. But, it requires recording what you do and the place that makes the most sense to start is the web, where people are actively seeking to solve these problems: Discovery - finding new and relevant content Recovery - retrieving content you've seen or written before Community - finding people like you And, when you record your attention, your computer can *help* you do all of those things. Beyond that, when the recorded data is *yours* and not Amazon's or Google's or Microsoft's, you can control who sees it and what happens to it.
JustPlain » Blog Archive » Paying Attention with J Wynia on 3/27/2006
[...] Recording Your Attention: Spying on Yourself Dissecting the Attention Recorder: Write Your Own [...]
blog comments powered by Disqus
Or, browse the archives.
© 2003- 2013 J Wynia. Very Few Rights Reserved. This article is licensed under the terms of the Creative Commons Attribution License. Quoted content or content included from others is not subject to that license and defaults to normal copyright.