Colin's Journal: A place for thoughts about politics, software, and daily life.
I have finished porting SimpleTAL to Python 3. Release 5.0 of SimpleTAL is for Python 3.1 and provides similar functionality as SimpleTAL 4.2 does for Python 2.5. The differences between using 4.2 and 5.0 are documented on the SimpleTAL notes page.
At first the porting process was fairly easy. I started by getting all test cases to run cleanly under Python 2.6 with the -3 flag, and then ran 2to3 to convert the basic syntax. The next step was to run the test cases under Python 3 to highlight issues that required manual changes. Sgmllib has been removed from the standard library, so I had to remove HTMLStructureCleaner from simpleTALUtils (it was unused within the library itself). The Iterator protocol change from “next” to “__next__” meant my iterator detecting code had to be updated.
The changes to character set handling in Python 3 introduced slightly more complex changes for the template handling. In Python 2.x the SimpleTAL library would handle all encoding / decoding itself, but in Python 3 this is not always required as there is now a clean separation between bytes and strings.
One issue that I hit when porting to Python 3 was the use of regular expressions. In order for SimpleTAL to pass through singleton XML elements from the template (i.e. <tag /> rather than <tag></tag>) it needs to carry out a regex check against the raw XML that the SAX library provides. This is done by retrieving the xml.sax.handler.property_xml_string property, which is documented as returning a string. In practise however the Python 3 SAX implementation returns bytes, which at first I assumed the regex library would not work with. A little bit of research later, and I learned that the regex library can work on bytes as well.
One final surprise was the huge performance gain moving from Python 2.6 to 3.1. The SimpleTAL performance tests show a minimum speed increase of 60% (on the METAL test), with some tests clocking in 90% increases. Both HTML and XML basic template expansions are now hitting over 1600 pages/sec on a single 1.7GHz CPU.
The full list of my published Software
Email: colin at owlfish.com