Undefined Signed

Posted on: 2016-05-23

I came across this blog post...

http://blog.libtorrent.org/2016/05/unsigned-integers/

I have slowly come around to the idea that using unsigned numbers across your code is probably a bad idea. The reasons why are somewhat convoluted, but the main nice thing about unsigned numbers is that they say something about the number a function is receiving - that I don't need to worry about it being negative.

It's not that simple though because C/C++ will happily promote from signed to unsigned, although it will likely give you a warning.

I have also come across on several occasions the underflow bug when using unsigned. Namely

// Whoops this is an infinite loop!
// because UInt(0) - 1 = MAX_UINT
for (UInt i = 10; i >= 0; i--)
{
    // do something
}

What I wasn't aware of was the potential performance implications - because the optimizer can optimize code using int instead of unsigned int. Why? Because int provides no guarentee on behavior for underflow or overflow. That being the case it can see as the number space as contiguous. That is int(0x7fffffff) + 1 is strictly speaking undefined in C/C++. That also means asking if (i + 1 > i), the compiler can assume i + 1 is not going to overflow. That assumption means it can optimize if (i + 1 > i) to just if (true). This is cool, and also interesting as there is a lot of code that does assume specific behavior with overflow and underflow on integrals - namely twos compliment wrap around.

Just to test out the theory I thought I'd compile a bit of code

#include <stdio.h>
#include <stdlib.h>

int main()
{
    // (Assuming 32 bit ints)
    // I want a random number - so the compiler can't optimize 
    // the test away. Xor'ed to force result to be 32 bit
    int a = rand() ^ (rand() << 16);
    if ( a + 1 > a)
    {
         printf("a + 1 > a\\n");
    }
    unsigned int b = a;
    if (b + 1 > b)
    {
         printf("b + 1 > b\\n");
    }
    return 0;
}

The first if - because a is an int can in principal be optimized away. The second cannot - as with an unsigned int, if b is 0xffffffff, then b + 1 is not greater than b.

Does this optimization happen in practice? I tried g++, clang and visual studio. The results were

Visual studio 2015 - did not do the optimization.
Gcc - did the optimization, removing the if entirely.
Clang - did the optimization.

Clang also did something interesting. It realized in the unsigned case b + 1 > b is only not true, when b = 0xffffffff, so just tested for that. Clever.