As a side issue I've been doing some work on hashing. These results were produced on Linux on 1007U/x64 CPU...
Time Hits Hash
U Scatter (E) 17765 95 e3bf408c
U OneAtATime (E) 24078 91 c89a1410
A RabinKarp 5840 91 8a2bee68
U RabinKarp 5867 96 f2e8958c
A Bernstein 6438 96 764b8998
U Bernstein 6220 91 ed3197fc
A Hseih 9464 91 f62f7790
U Hseih 9954 101 4a7dffd4
A Fnv1aMeiyan 4956 101 63b87c78
U Fnv1aMeiyan 6105 97 1ff4fd14
A CrapWow 6243 97 dea09e44
U CrapWow 6705 94 15e59b0c
A Default 6589 94 8a2bee68
U Default 7178 96 f2e8958c
The A/U is if it is an aligned or unaligned implementation. The A means 32 bit alignment, U is unaligned. On x86 cpus you can do unaligned reads, and they are actually surprisingly fast. On other CPUs that would cause a fault, and there is another code path which does the alignment in software.
The (E) means that the algorithm is endian independent - generally meaning it reads data in byte order. As seen from the results it's typically 1/4 of the speed. That's not very surprising when you consider most other hashes are working with 32 bit reads, so can do 4 times as much work per iteration.
The results for 64 bit hashes...
U Scatter 17303 96 e520d643ba36f0d0
A RabinKarp 6358 96 55a767b88a2bee68
U RabinKarp 6399 96 17481ff7f2e8958c
A Bernstein 6195 96 70b01560764b8998
U Bernstein 5051 91 ab135279ed3197fc
A Default 6740 91 55a767b88a2bee68
U Default 7120 96 17481ff7f2e8958c
The hits column is a fairly crude measure of how many times a bucket is hit using the hash. The hash is most often used with set/maps. In my implementation these are generally power of 2 sized. Approximately take the closest larger power of 2 and double it. The hits value measures how many times a power of 2 bucket was hit. In terms of hash usage, lower is generally better.
I liked this hash function article, and was interested in improving endian independent hashing - something that will be useful for serialization. I tired OneAtATime as it recommended but the performance is terrible. The scatter algorithm is very simple - it just reads a byte does a random table lookup and mixes by rotating and xoring. It's performance was better but still pretty slow.
So I decided to see if I could use RabinKarp. I already had code which allows it to work with unaligned data. As RabinKarp works by reading int32s there needs to be byte swap on one of the endian types - I chose big endian as it's more unusual.
In terms of more general usage my actual raw data may not be in 'byte order' in memory - an array of int16s for example will already have each int16 byte swapped for the endianess of the machine.You could do the endian flip for the uint16, and then endian flip the int32 you read, but you can do it more efficiently by just doing the int32 read, and doing short swap on that. With int32 data, no swap is needed (it's already in the right order), and on int64 - you need to do a int32 swap on the int64.
It took a while to get this all to work. I also wrote code to shift around, and resize chunks of data to hash, as well as some code to reorganise memory such that a block of memory when read would appear as it would on a flipped endian machine. This was important because I don't have an easy setup for testing big endian code. I could pull out the PowerPC mac mini - but that's going to be painful :)
In conclusion I'm pleased with the reorganisation, and it's good to have code that can give good analytic results to decide which code path is best to use. That said retrospectively the work on improving endian independent hashing was probably not really worth it. It's a fairly rare code path - just something about it got in my craw. It seemed simple enough, but there are lots of annoying details when you mix arbitrary endianess/alignment/length. Hey ho.