Colin's Journal: A place for thoughts about politics, software, and daily life.
Yesterday I thought I would take a look at the performance of SimpleTAL, and look to see if there were any easy ways of improving it. I took a small (one screen full) template consisting of lots of ordinary text, a repeat command, and a couple of content commands, and timed SimpleTAL expanding it 200 times. The result was around 5 templates/sec.
I had an idea of pre-parsing the template and turning it into a series of events (start tag, data, and end tag). I implemented this fairly quickly, and found that performance improved up to the 11 templates/sec mark. I know, however, that Zope’s TAL engine can go significantly faster than this, so I started looking at it again and trying to work out how I could improve things significantly.
The current SimpleTAL implementation uses OO methodology fairly heavily. This means that for each tag in the template an object is created, and at least one handler object (often more). The tag is then passed to each handler which does various things to it based on the evaluated expressions coming back from the simpleTALES module. The result is that for a given run of the template, even with the HTML/XML parsing done before hand, there is a significant amount of object creation (expensive), a large number of method calls rather than variable access (expensive) and text manipulation/parsing.
The Zope way of getting around this is to parse the template into an inter-mediate byte code. This byte code is then used by an interpreter to generate the template, with very little in the way of object creation. I’m now re-factoring SimpleTAL in a similar way to see how much improvement I can get, and so far it’s looking promising. I’m still along way from finishing, but I have content and repeat working well enough to run my performance template, and the result is now around 90 templates/sec – a near 95% improvement! The unfortunate side effect though is that the code is harder to understand because it’s data structure driven instead of object driven, which will make maintaining the code a lot harder.
We were on our way to the airport, on a long and indirect series of flights back home for Christmas, when I first saw our neighbourhood wind turbine. The fact that up to then we had failed to notice a 94 meter tall wind turbine protruding out of the city scape indicates how far away from it we live. Upon return to Toronto I have spent the odd moment looking out over our deck to see whether or not we can see it from our flat, and thought several times that I maybe could see it, hiding behind a tree in the distance.
When I awoke this morning and glanced out over the deck I noticed, in the near distance, the turbine happily rotating away. It is indeed behind a tree some distance away, but quite visible when it’s turning. Apparently it cost approximately C$1.2M to build (it’s not clear if any maintenance is included in that figure) and will generate 1,800 MWh of electricity a year, which at market rates (now fixed by the government at 4.3c/kWh for most consumers) could bring in $77K per year.
As such it’s only a symbolic and public awareness development, but it seems that when a large site is developed the cost of wind generated electricity is pretty comparable to a new coal station and close to that or a modern gas fired station (see the excellent British wind energy site and this rather more independent FT article). Wind energy is only part of the answer (apparently only 10% of the UK electricities needs can be met this way before reliability becomes an issue), but it is still nice to have our own neighbourhood turbine, and even nicer to see how quickly wind power generation is being built.
I received an email this morning from Thomas Weholt which detailed an interesting problem he encountered when using SimpleTAL. The source of the problem turned out to be that the path resolution rules being used would match an attribute before looking at the mapping an object provides (more details here). I spent some time looking at what might be a good fix for this, and then found that Zope behaves the same way, so for now I’ve left the implementation as is.
The research however got me looking at another potential problem: when content is included using the “structure” keyword any TAL attributes included will be expanded. This allows for some very cool and interesting things, but it does present a problem when you need to display user input strings using structure. The problem is that the user’s input has access to all of the attributes of all objects that are included in the context, which is a potential security problem. I was in two minds as to whether or not I should provide a way of disabling this, so I again checked on how Zope handled this situation, and I found that it would not expand TAL included in this fashion. So that both behaviours are available I’ve now added a “allowTALInStructure” parameter which will control whether any TAL found in “structure” content will be expanded. I also found, during the creation of some unit test cases for XML templates, that SimpleTAL 1.0 could not handle content included using “structure”, thankfully that turned out to be a one line fix.
The end result is that I’ve just uploaded version 1.1 of SimpleTAL. I’ve run through all of the unit test cases I have, and compared the results of my weblog program using the new version to the old, and everything seems to still work.
Here are some links to a few tech articles that have caught my eye over the last few days. First up, a problem with RSS – it seems that lots of sites out there are not creating valid XML files for their RSS feeds, and so aggregators are being modified to no longer handle just XML, but also trying to handle mal-formed XML as well. An article by Mark explains why this is happening, but provides no ideas on how to deal with it.
Why should anyone care whether their RSS feeds are valid XML? Well if they are valid XML files it means that they can be used by other programs. If they are not valid then they can only be used by certain programs, and so the cost of software rises (fewer features because people are spending their time writing parsers to handle bad XML, or more costly to cover the extra effort). What was really surprising about the article (on xml.com) was to note that even Scripting News occasionally publishes bad XML, which is a site run by someone who is responsible for one of the most popular RSS aggregators used! There really is no excuse for this lack of quality in RSS feeds, XML processing tools are freely available and easy to use, so why do people insist on rolling their own that don’t work?
Another story, this time an interview on the art of programming, and how it might be improved (via Slashdot). It’s a very theoretical discussion, but an interesting one that has some relevance to my previous thoughts on RSS. The idea expressed is that programming doesn’t scale to large systems well because you only need a small bug in one piece to cause a large failure, rather than a failure that is on the scale of the original defect. The solution proposed is that systems should communicate using pattern recognition rather than via defined protocols. This approach would endorse the idea of having XML parsers handling bad XML rather than complaining; software modules should extract whatever information they can out of what they are given rather than demanding that it matches a well defined protocol.
An alternative that I would promote instead, is that software should demand all communication be done using well defined protocols, but that it should make no assumptions as to what the information means to others, or care about any extra information that may be present. In practise this would mean that software should demand valid XML, and then it should extract from that XML whatever it finds interesting and ignore the rest. This means that a bug in a software module is localised to a specific set of information, the rest of the system carries on running, with only modules that rely on that piece of information affected.
Finally, as most people reading this will already have found out first hand, the Internet was struggling today thanks to the spread of an SQL Server worm. The thing that this highlighted to me was not the number of people running un-patched versions of the software (not unexpected), but rather the number of people who have made their databases accessible from the Internet directly. There seems little reason why anyone would do this, but the sheer volume of traffic generated by this thing shows that a very large number of people indeed have databases running open on the network. It’s also a classic example of a small defect in one module having a dis-proportionally large affect on the whole system. It would be relatively easy for networking switches and firewalls to match patterns of network usage that could be deemed ‘unusual’ and so drop packets that fall into this category. If this is what Jaron Lanier is referring to in his interview then I can see what he means, but I would think of it as just robust programming, rather than a huge change in how we think about software.
A fairly good article by the BBC on the recent strengthening of the French/German alliance. The timing of these developments is interesting, and I’m not sure what to make of it. My personal reaction is to think about the current work of the convention on the future of the EU, and to consider that any constitutional arrangement will have to ensure that a French/German alliance does not dominate policy.
This is also likely to be the response of the leaders of the other members of the EU – and surely France and Germany know this. So could it be that this is exactly the response that the pair (or one of the pair) is looking for? If so why? I suppose it might push the federal cause a little further ahead, but I’m not sure it works that much. Another answer might be that they are trying to concentrate minds – France and Germany are moving forward on European integration, so other countries need to come forward with commitments on integration if they don’t want to be left behind.
Hopefully I’ll find some ideas on this out there somewhere…
It’s been rather cold out recently. It’s not cold in the British sense of “it’s been really cold recently, there was a frost on the ground this morning!”, rather it’s been cold in the “beware you don’t freeze to death on your way to work”. This morning it was around -20C and, according to Environment Canada, it’s currently -16C. That’s without the wind chill. Thankfully this morning there was little in the way of wind, but tonight there is enough to put the forecast at a wind chill of -35C.
So it’s cold. Despite this coldness however I noticed, on the way home from work, that there are still a couple of shops in China-town that have their shop fronts completely open. When I type “shop fronts” I really mean it – the whole front of the shop – open to the elements, which currently means -16C. The increasing costs of energy in Ontario don’t seem to be biting as hard as perhaps they should.
I’ve had some great feedback on my SimpleTAL library, and a few questions. The original pages that I put up were a little spartan, even by my standards, but I’ve been adding to them over the last couple of days to try and make them a little more informative. I’ve added a couple of examples that show how to use the library, and a page documenting the differences between this implementation of TAL and the Zope version.
It would be nice to add pages demonstrating each of the different TAL attributes and how they work, but it’s a fair amount of work, so for now I’m relying on the Zope documentation. An aspect of the documentation that I will work on however is a description of the SimpleTAL API. It is very easy to work out from the source, but it’s much nicer and easier to have it put into a web page instead.
One of my shoe laces broke this morning, leaving just enough lace left to keep my shoe on my foot. At lunch I went to purchase a replacement shoe lace, and thankfully the local chemist had them. I was expecting that I would have to buy a pair of shoe laces, instead of the one that I needed, but I was wrong. I had, in fact, to purchase two pairs of shoe laces instead.
Shoe laces also come in multiple lengths, with a handy (in-accurate) chart on the packaging indicating what length you may need based on the number of islets your shoes have. Sod’s law – my shoes fall at the upper end of one length recommendation. Still I got the size indicated, and although they are a little on the shy side, they will do. The question remains however why you have to buy two pairs, with a single pair not being an option? How many people have two identically coloured shoes, of the same number of islets, suffer broken laces at the same time? If shoe laces have to be sold four at a time, why can they not at least put two different sizes in the same packet, so that you can buy in the confidence that at least one of them will be correct?
The weblog system that I have put together is based on the use of a template language called TAL. TAL is part of Zope the large Python based CMS system, and it relies on various C modules that come as part of Zope. To use TAL I had to write my own implementation or work out a way of making the Zope version work without Zope (others have since done this using the original, but it’s not widely available).
In case this library is of any use to other people I’m putting it up on my website. If you’ve never heard of TAL and do CGI programming in Python, or have other needs for a simple template language for HTML and XML, then take a look. Start with the TAL link above, and then play with my implementation SimpleTAL, if you like it then check out the rest of Zope.
I’m reading (or rather skimming) the UK Governments consultation document on identity cards as I try and think of how I can compose a suitable email on the subject. If you’ve not done so already, and care about the subject, then take a look at the stand website.
While looking through the document I saw the table of minimum ages that you need to be before you can do certain things in the UK, and learnt that you have to be 17 not just to drive a car, but also to purchase a cross bow.
The full list of my published Software
Email: colin at owlfish.com