Archiving My Twitter History
I've been using Twitter off and on since late 2006 and have racked up almost 2000 entries in that time. Earlier this year, when Twitter hit stretches where the service was out for entire days at a time, I started using Friendfeed to do the actual posting and pushed those posts over to Twitter.
Part of what sent me over to Friendfeed was a sinking feeling that most of those early posts would go down with the ship and I wanted to get some redundancy in where that ephemera was showing up online. Most of those posts were throwaway (see Sturgeon's Law) but I also know how much I enjoy going through slices of complete ordinariness after my memory of the events has faded. So, I want to keep a copy of all of this stuff and figure out later what's interesting.
At any rate, given Twitter's problems, I tried to get those old posts out whenever things would work for a patch. Unfortunately, even when things were working, if you went back through the pages of the archive, you'd get about 10 pages in and hit a wall. Given that my Twitter history is about 100 pages long, that left the vast majority of my posts behind that wall, possibly never to be seen again.
However, not wanting to resign myself to just having an archive going forward, I put an entry on my calendar to keep trying every few weeks, just in case they eventually fixed things. So, tonight, the reminder popped up and, what do you know? I could go through all 99 pages of my archive.
Not wanting to lose what very well may be a 1-time window of opportunity (they've pretty much said that the instant messaging isn't coming back), I banged out a quick console app to grab all of my posts into files for later processing.
It's mostly copy-n-paste stuff (don't judge, it was written while I was eating curly fries and a roast beef sandwich), but I did include bits to hold off for an hour and wait if it gets throttled by the 100 requests per hour limit from Twitter. If you've got a lot of stuff on Twitter, running this may best be done overnight.
At any rate, it grabs the posts 20 at a time and saves them to a directory as .atom files. You set the Twitter username and password in the app.config file like this:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<appSettings>
<add key="TwitterUsername" value="jwynia"/>
<add key="TwitterPassword" value="IM_NOT_THAT_DUMB"/>
</appSettings>
</configuration>
And then a basic console app (with some extra namespaces from stuff I threw away still in there), looks like this. I didn't bother with any of the API wrappers, just using basic .NET code to do it.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Configuration;
using System.Collections.Specialized;
using System.Security;
using System.Net;
using System.IO;
using System.Xml;
using System.Xml.XPath;
namespace TwitterArchiver
{
class Program
{
public static String Username {get;set;}
public static String Password { get; set; }
public static NetworkCredential TwitterCredential { get; set; }
static void Main(string[] args)
{
NameValueCollection appSettings = ConfigurationManager.AppSettings;
Username = appSettings.Get("TwitterUsername");
Password = appSettings.Get("TwitterPassword");
TwitterCredential = new NetworkCredential(Username, Password);
String OutputDir = Username + "-export";
Directory.CreateDirectory(OutputDir);
int NumberOfPages = GetUserPageCount();
for (int i = 1; i <= NumberOfPages; i++)
{
String ArchivePage = GetArchivePage(i);
WriteStringToFile(ArchivePage, OutputDir + "/" + Username + "-page-" + i.ToString() + ".atom");
System.Threading.Thread.Sleep(1000);
}
Console.WriteLine("=====Press any key to continue=====");
Console.ReadLine();
}
public static void WriteStringToFile(String InputString, String Filename)
{
StreamWriter sw = new StreamWriter(Filename);
sw.WriteLine(InputString);
sw.Close();
}
public static String GetArchivePage(int PageNumber)
{
Uri TwitterBaseUri = new Uri("http://twitter.com/statuses/user_timeline/" + Username + ".atom?page=" + PageNumber.ToString());
CredentialCache myCache = new CredentialCache();
myCache.Add(TwitterBaseUri, "Basic", TwitterCredential);
WebRequest request = WebRequest.Create(TwitterBaseUri);
request.Credentials = myCache;
string responseFromServer;
try
{
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Console.WriteLine(response.StatusDescription);
Stream dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
responseFromServer = reader.ReadToEnd();
reader.Close();
dataStream.Close();
response.Close();
}
catch
{
System.Threading.Thread.Sleep(1000 * 3600);
responseFromServer = GetArchivePage(PageNumber);
}
return responseFromServer;
}
public static int GetUserPageCount(){
Uri TwitterBaseUri = new Uri(" http://twitter.com/users/show/" + Username + ".xml");
CredentialCache myCache = new CredentialCache();
myCache.Add(TwitterBaseUri, "Basic", TwitterCredential);
WebRequest request = WebRequest.Create(TwitterBaseUri);
request.Credentials = myCache;
int NumberOfPages = 0;
try
{
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Console.WriteLine(response.StatusDescription);
Stream dataStream = response.GetResponseStream();
XPathDocument ResponseDoc = new XPathDocument(dataStream);
XPathNavigator ResponseNavigator = ResponseDoc.CreateNavigator();
XPathNodeIterator ResponseIterator = ResponseNavigator.Select("//user/statuses_count");
ResponseIterator.MoveNext();
int PostCount = int.Parse(ResponseIterator.Current.Value);
dataStream.Close();
response.Close();
NumberOfPages = PostCount / 20;
if ((PostCount % 20) > 0)
{
NumberOfPages++;
}
}
catch
{
System.Threading.Thread.Sleep(1000 * 3600);
NumberOfPages = GetUserPageCount();
}
return NumberOfPages;
}
}
}
Build and run that and you, too, can see what you ate for dinner 2 years ago. I think I might put together a highlight post of my personal favorites from my archive to make this worthwhile.
At any rate, I need to get back to more pressing matters (the economy might be slow, but I'm busier than ever right now and what looks like into 2009, so I shouldn't complain). Later.

