Colin's Journal: A place for thoughts about politics, software, and daily life.
Here are some links to a few tech articles that have caught my eye over the last few days. First up, a problem with RSS – it seems that lots of sites out there are not creating valid XML files for their RSS feeds, and so aggregators are being modified to no longer handle just XML, but also trying to handle mal-formed XML as well. An article by Mark explains why this is happening, but provides no ideas on how to deal with it.
Why should anyone care whether their RSS feeds are valid XML? Well if they are valid XML files it means that they can be used by other programs. If they are not valid then they can only be used by certain programs, and so the cost of software rises (fewer features because people are spending their time writing parsers to handle bad XML, or more costly to cover the extra effort). What was really surprising about the article (on xml.com) was to note that even Scripting News occasionally publishes bad XML, which is a site run by someone who is responsible for one of the most popular RSS aggregators used! There really is no excuse for this lack of quality in RSS feeds, XML processing tools are freely available and easy to use, so why do people insist on rolling their own that don’t work?
Another story, this time an interview on the art of programming, and how it might be improved (via Slashdot). It’s a very theoretical discussion, but an interesting one that has some relevance to my previous thoughts on RSS. The idea expressed is that programming doesn’t scale to large systems well because you only need a small bug in one piece to cause a large failure, rather than a failure that is on the scale of the original defect. The solution proposed is that systems should communicate using pattern recognition rather than via defined protocols. This approach would endorse the idea of having XML parsers handling bad XML rather than complaining; software modules should extract whatever information they can out of what they are given rather than demanding that it matches a well defined protocol.
An alternative that I would promote instead, is that software should demand all communication be done using well defined protocols, but that it should make no assumptions as to what the information means to others, or care about any extra information that may be present. In practise this would mean that software should demand valid XML, and then it should extract from that XML whatever it finds interesting and ignore the rest. This means that a bug in a software module is localised to a specific set of information, the rest of the system carries on running, with only modules that rely on that piece of information affected.
Finally, as most people reading this will already have found out first hand, the Internet was struggling today thanks to the spread of an SQL Server worm. The thing that this highlighted to me was not the number of people running un-patched versions of the software (not unexpected), but rather the number of people who have made their databases accessible from the Internet directly. There seems little reason why anyone would do this, but the sheer volume of traffic generated by this thing shows that a very large number of people indeed have databases running open on the network. It’s also a classic example of a small defect in one module having a dis-proportionally large affect on the whole system. It would be relatively easy for networking switches and firewalls to match patterns of network usage that could be deemed ‘unusual’ and so drop packets that fall into this category. If this is what Jaron Lanier is referring to in his interview then I can see what he means, but I would think of it as just robust programming, rather than a huge change in how we think about software.
The full list of my published Software
Email: colin at owlfish.com