Colin's Journal

Colin's Journal: A place for thoughts about politics, software, and daily life.

April 7th, 2003

A case study: Sods law

So there I was proudly stating that I had not had any kernel panics with the BeFS module, and how I had recovered my data when guess what happened? Yes, my music stopped, the screen stopped redrawing, and my keyboard did the “flashing all the lights” thing.

I’m not sure exactly what caused this kernel panic, but thankfully I don’t seem to have lost any data. The floppy drive has been on the way out for a while, and when I rebooted it was making a very sickening screeching noise. I’ve unplugged it for now, and I think I’ll get a replacement. So was it the module, the floppy, or something else entirely? Not sure…

April 7th, 2003

Digital Archaeology, or how I rescued my email

Many moons ago (approximately sixty by my reckoning) I bought myself a new computer, and having very carefully selected hardware that was supported, I installed BeOS. It was a fun, fast, life enriching operating system that was blazing the trail to a bright future. It also had very few applications that ran on it, and tended to crash rather a lot, particularly when web browsing.

BeOS came with it’s own disk file system (BFS), it’s own way of handling email (single file per message), it’s own “almost a database” way of organising data, and many other fancy features. As the fortunes of the startup behind the o/s waned, and Be Inc started laying off staff and changing direction, I started looking for an alternative.

I ended up choosing Linux, and found myself on a frustrating, slow, life shortening operating system that had many unfinished applications, and a web browser that tended to crash a lot. Thankfully as time progressed Linux has improved in leaps and bounds, to the point where there are lots of finished applications and browsing the web almost never leads to crashes.

During my migration to Linux I kept my existing BeOS installation to one side, thinking that one day I must really go back and retrieve my data off it. Several hardware upgrades later however, and I found that I couldn’t boot into BeOS anymore. I retrieved the boot CD and floppy from the other side of the Atlantic, and found that I still couldn’t boot into BeOS. So much for my data…

A couple of weekends ago I found a Linux module that handles BFS (or BeFS so as not to be confused with the other BFS that’s out there…) I compiled, installed, and tried to mount my BeOS partition. It worked! No kernel panic, no errors loading the module, just a mounted file system with all my data sat there.

Since then I’ve been going through my old BeOS system and pulling out various parts of it that I would like to keep around. I also discovered somethings that I had forgotten doing, like writing a POP3 client to handle downloading email (written as a work around for a bug in the client that shipped with the system).

Among the old data were all my old emails (a little over five thousand of them), but they were stored in a format that Evolution (my email client) refuses to read. Thankfully the format is very simple (full email text as a single file), and the mbox format that Evolution does understand is equally simple. I’ve written a tiny Python script to convert BeOS Mail to mbox format, and after a few iterations to shake out the bugs it has worked well enough to restore all my old email.

I’ve now got just over 19,000 emails in my system, dating back to June 1998, and hopefully I’ll be able to keep these around for many years to come. Just remember when transitioning systems that you need to move your data over as soon as is possible, because it only gets harder as time goes on…

April 5th, 2003

Free RSS Aggregator Released!

I’ve finished packaging together my RSS Aggregator. It’s at a point where you can use it on an every day basis without hacking code or fiddling with the database.

I’m releasing it on the off chance that someone else might need software that does a similar sort of thing, it would be a shame for two people to have to write it!

If you are curious as to what it looks like, here’s a screen shot of my “recently updated articles” page.

April 4th, 2003

Deploying LAMP – and holding back the flooding

Deploying a LAMP (Linux, Apache, MySQL, Python/Perl) application is difficult. I’ve just put together the briefest description of how to install my web-based, multi-user, RSS aggregation application – and frankly it requires a Unix administrator to do it. I new it would be difficult (I wrote this for myself, I’m just planning on releasing it on the off chance that someone else might want/need a similar thing), but when you finally write a document which describes the steps it’s driven home.

For a start there are eight different software packages that it depends on, although it’s a fair guess that four of them are installed by the distribution of Linux you are using (in theory this is cross platform, but that’s just one complication too far). Then there is database creation, schema creation, basic configuration data setup, the apache configuration, and finally the application configuration. Then you can log-in to the system and start using it…

I see that there is going to be an attempt to stop the worst of the flooding that happens to Venice. I don’t know enough about the politics and plans surrounding this to comment on the significance of this particular announcement, but it does raise a thought. I wonder how many other dynamic flood defences like this exist in Europe? I know about the Thames Barrier, but there are probably others…

April 2nd, 2003

More thoughts on BitTorrent

Firstly it should be made clear that BitTorrent itself is not a piracy tool. It has many perfectly legitimate uses for transferring large files whose author has given permission for such free distribution. Having said that there do appear to be many easily accessible sites, such as this and this, that are hosting the information required to get access to TV series, films, and music which can not be legally distributed freely.

These sites only hold the .torrent files, which as I explained in an earlier post do not actually contain the copyrighted material. They instead point to a central server, which in turn keeps track of those IP addresses that are involved in distributing the material. It’s surprising that these sites have not been taken down yet, they are not hard to find, and while not many people have the time or bandwidth to download ~1GB files, the number which can is growing steadily.

It’s possible that, if the owners of one of these sites actually had the money available to take such a matter to court, there would be some countries where the hosting of these .torrent files would be found to be legal. They do not after all tell you directly where copyrighted material can be found, they simply point to an IP address that in turn lists people who do have such material. In most places this argument would probably fail, but you only really need one or two jurisdictions in which it’s legal to host these files, and they will continue to be available.

Those running trackers are far more vulnerable, they are the closest thing to the central server used by Napster. The major difference is that while Napster had one central location that everyone knew about, with BitTorrent you can have many different trackers managing different or overlapping sets of files.

This means that while individual legal victories might be had at any level of the BitTorrent architecture (torrent hosts, trackers, or peer-to-peer clients), it would be very hard to stop the distribution of copyrighted material this way. However by taking action against the torrent hosts it would slow down the spread of such material, pushing the location of .torrents underground onto IRC and other such networks. Ensuring that getting the material is more difficult than a search on google would be at least a tactical victory for those trying to suppress the free distribution of copyrighted material.

April 2nd, 2003

Games of chance

Both Sunday evening and last night were spent playing various Cheap Ass games, and on the off chance that you have never heard of them before, be assured that they are great fun indeed. One of the new ones that we picked up at Ad Astra is Witch Trial, a fun card game with significant gambling elements, and a touch of role play to keep things interesting.

The premise of the game is that you are a lawyer during the witch trials in the US, and you are out to make money by prosecuting and defending cases. The play is varied enough that I think we’ll come back to playing it many times again in the future, joining Kill Doctor Lucky as a classic.

April 1st, 2003

BitTorrent and distributing large files

I’ve heard about BitTorrent before, but it was only today that I saw a great example of how it can change the nature of distribution of large files on the Internet. Red Hat released ISOs of version 9 of their Linux distribution to paying subscribers, and someone (legally) made them available through BitTorrent and announced their availability on slashdot. The result was that people could get hold of the ISOs through the peer-to-peer swarm more quickly than they could through the overloaded FTP site.

The reason the peer-to-peer network was faster is down to the way BitTorrent works, which is that each downloading client also becomes a provider of the file. A major strength of BitTorrent is that the downloading client doesn’t have to complete the download before it can offer uploads, whatever portions have already been downloaded are made available for upload to others in the swarm that might need them.

The architecture consists of three main components. The .torrent file contains a description of the file (or directory) that is to be downloaded, including name, file size, and a secure hash of each chunk of the file. It additionally contains the URL of a BitTorrent tracker.

The tracker maintains a list of peers currently involved in transferring a particular file (or directory), as well as some stats around what each peer is up to. The client, after parsing the .torrent file, connects to the tracker and gets the list of peers in the swarm. The client then contacts peers from this list directly, offering up portions of the file that the client already has, and asking for portions that it requires.

There is some load balancing to ensure that clients are uploading their fair share, you get faster downloads the more bandwidth you can provide on upload, and multiple downloads are performed at once (so enabling modem users to make a real contribution of bandwidth even to those on broadband connections).

It’s an excellent way of distributing large files without having to foot a huge bandwidth bill.

March 31st, 2003

Unicode and databases

I updated my RSS Aggregator this weekend to make it distinguish between changes to posts and new posts. Originally it would compare the title and description of every RSS item that it read in with those already in the database (via a checksum for performance reasons). A problem I kept encountering was that some items would be updated several times after they first appeared, and so my aggregator would treat them as new posts.

Now I use the <guid> element if it is present to distinguish unique items, or if these are not present I use the title and link of the items. If the description of the item has updated since the last time it was read, I update the version in my database, but leave the date of discovery the same so that the reverse chronological ordering isn’t affected. While doing this I encountered a problem when pulling data out of MySQL.

The problem is that the python module I use to access mysql (MySQL for Python), while happy to accept Unicode strings as parameters, will present any data retrieved from the database as a plain string. When doing a comparison between the Unicode extracted from the RSS feed and the results from the database query Python attempted to convert the string to Unicode, treating it as ASCII, which would cause an error if it contained latin1 characters.

Unfortunately MySQL doesn’t seem to support the storage of Unicode (certainly not at version 3.23.49), you have to store your strings in a particular character set. This will work fine for myself (latin1 will cover everything I need), but I can’t see how it would work if you subscribed to two RSS feeds, say one in big5 and one in latin1. The documentation for version 4.1 states that it adds “Extensive Unicode (UTF8) support.”, so hopefully once it makes it into Debian stable this problem will go away…

March 29th, 2003

The passing of Citron

I carry bad news with these words. Citron, our favourite restaurant in Toronto, has passed away. It is no more, replaced physically but not culinary by a third version of the Butler’s Pantry. It’s friendly staff, great selection of new world and fusion dishes, and delicious deserts will be most sorely missed. Citron was a great little restaurant for spending the entire evening in, relaxing with a bottle of wine and conversing over great food, with no worry about the passing of time. They updated their menu a few times during the year with the passing of the seasons, making it hard to tire of their offerings as it is so easy to do with favourite eateries.

March 28th, 2003

SARS in Ontario

The spread of SARS is being particularly felt in Ontario this week. We have had the request for voluntary quarantine of all those that had visited Toronto’s Scarborough Grace Hospital on or after the 16th of March. It’s estimated that this will affect thousands, although how many will actually place themselves into quarantine for ten days is questionable. In fact it seems like the perfect cover for 10 days sick leave – “Boss, I got this tan in quarantine!”.

Today it’s been announced that starting this weekend there will be screening of passengers at the airport to try and limit the further import and export of the disease. The total number of cases around the world, broken down by country, and other interesting information can be found at the WHO site. Currently it stands at 53 deaths and 1485 total cases.

Copyright 2015 Colin Stewart

Email: colin at owlfish.com