September 15, 2009

Python, XML and Google Maps

By Chris Heiland

This was originally posted on Chris Heiland’s staff blog while he was a member of the Web Team. Chris migrated this post to the Web Team blog before he left because we thought the content still had value to the UW community.

Generating and organizing data for the campus maps project is a constant journey. I’ve switched between several XML formats, with interesting results. Finally, I have landed on a blend of formats that gives adequate performance and allows for updating.

I’ll probably split the entire process into several posts but here is the start. Originally I used the provided format that the GXml parser can understand. Basically it looks like this:

<marker attrib1="" attrib2="" />
<marker attrib1="" attrib2="" />

Lots of attributes, not many nodes. GXml actually deals with this fairly well as it can rip through the results quickly. If you need to build markers based on this then there are several side benefits, including allowing Google to cache the data for you.

Now originally under this assumption I created the XML using the xml.dom in python with some custom classes. The actual implementation was a bit dirty/creative as I was creating the structure of the output from the python classes. The logic ran through all the data, created the necessary objects and then the output was a simple call to .toxml('utf-8'). Easy.

Later on I needed a way to change the output to the following:


We needed the structure to be created out of key value pairs in nodes instead of attributes. This seemed like an easy switch, I could still reuse the same classes, but instead of creating attributes I would create nodes. Well, kinda easy.

I had to create some additional classes to handle the nesting and then add more logic so the properties could get populated correctly.  Short version is everything switched over and was working. However, when I switched the map to use the new format I ran into major problems.

Basically I would have to rewrite a bunch of code and deal with some performance issues. I was parsing more nodes and had a huge nesting of loops to get to the organizations within each location.  Not good when this would need to happen very often.

Back to the drawing board I was using jinja2 for another project, seemed like a good idea as XML is just text, and templating engines deal with this well. After a bit of cleaning up the classes and updating the logic, everything is working great. I can switch between formats easily and suffer no performance loss.

Oh do I love python.

Leave a Reply