- GPU: Each transform applied to the graphics state resulted in two matrix multiplies. In many cases my code was stacking up to 5 transforms before rendering anything (scale, rotate, translate, rotate again, scale again...). This showed up as a CPU hot-spot. This was greatly reduced by composing the matrices locally then sending the final matrix into the graphics context.
- Memory: Every text operation (DrawString, MeasureString) resulted in the allocation of several temporary StringFormat objects. Just a few seconds of panning around the map would generate 3MB of such objects - even more than temporary strings and at least an order of magnitude more than any other memory churn. They would all be GC'd eventually, but under load this could have caused the site to blow past the application memory pool limit resulting in a reset. This was fixed by changing PDFsharp to re-use the default objects and I reported the issue to the maintainers.
The remaining major CPU hot spot is, unsurprisingly, in clip path intersections for borders and sector boundaries, which is probably the most subtle and complex part of the map rendering process. Unfortunately, it will be hard to optimize directly but I can probably reduce the need to do it in the first place, e.g. more aggressively pruning which borders to render for a tile, and only doing sector edge clipping if there's a border that goes outside the sector in the first place.
I'm hoping that above fixes will result in higher site availability - I'll be watching my monitoring service (Pingdom) to see if they've made a material difference.
UPDATE (2013-04-07):
Just to show how helpful profiling tools can be: I was investigating the border clip path logic and found a flaw in the "selector" code which helps determine which sectors are "in view" when rendering a tile. It was under-computing the number of sectors overlapped by a tile. Fortunately, there's already a 1-sector "slop factor" that's necessary so that e.g. routes or labels that cut across a sector's boundary are rendered, so this bug never manifested as missing content. Unfortunately, the correct code meant that even more time was spent on border path clipping (c/o the ANTS CPU profiler).
Borders are a special case where they are always clipped to the sector bounds anyway, though, so I rather than relying on the expensive clipping operations to do the right thing I added a simple bounds-intersection test. This cut border rendering from nearly 55% of the time of a typical tile rendering pass to only about 15%, under the overhead for drawing worlds (about 22%). This should also help with site stability.
Ω