Taking on stdlib

Posted on: 2015-02-17

A while back I implemented a collada parser. Collada is a xml based schema for interchange of 3d graphics assets. I had some fairly simple assets that had been exported and I was importing them into my runtime. I was surprised at how slow this was. My implementation had some interesting features. I had a xml parser that understood the basic xml structure - but you could add handlers for processing nodes with specialized tags. This allowed you to optimize for both space and time if you knew a tags contents were better handled specially. For example much of a collada file consists of long lists of integer or floating point values. I would specialize these such that the resulting Xml structure would have those values already converted into floats/ints, as opposed to storing the string and then converting later.

When profiling I found that the largest chunk of time for reading the files was actually in the text->int and text->float C functions. That was a little disheartening because you'd assume that such an often used function would be highly optimized in the c runtime libraries.

One reason for some hope was that my string representation is not zero terminated. This is for a variety of reasons - but mainly that it means you can slice strings without copying. This meant that to use the C libraries functions I had to copy the contents into a zero terminated buffer. If I rolled my own then I wouldn't have this extra overhead as modest as it might seem.

The other reason for hope is that for floating point I don't have the same restrictions as a full IEEE converter. For example I don't need to support denormalized floats. In fact I'd actively like denormalized floats to not appear because they crucify performance. Also if a float representation is out of range I don't need to check for or throw an exception, although I can include an assert on debug build. Finally I'm not that worried about exact rounding and precision as the c library. The floats I'm reading are used for graphics/physics/games after all.

So I wrote both a text->int and text->float conversion functions in straight C. Both were fairly straightforward. On the float converter I pre-calculated the exponent table, and when I read the mantissa I'd shift the exponent power depending where the floating point was.

On profiling on an i7-860 I found the floating point function was on average 10 times as fast as the c library, and the integer conversion function around 5-6 times as fast. The performance increases on arm weren't as high but also considerable (around 6 times as fast for text->float I think).

I was actually quite surprised at the performance improvement. It significantly improved my collada importers performance as well as other code that relied on text float/int conversions.

It goes to show that the conventional wisdom "that there's no point rolling-your-own stdlib functions as they will have out-optimized you" isn't always true, and in this case wildly so.