This morning I realized the flaw in my IntX scheme that I presented in my last post.
First I realized that my algorithm wasn't working for non-negative integers that were a multiple of 7 significant bits and that my test cases were sloppy that way. For example, 64 (
01000000) was encoding as that same byte value. When decoded, the most significant data bit
1 was interpreted as part of a signed number and yielding a result of -64.
I made a quick fix in the sample code I listed so that non-negative numbers wouldn't have an initial MSB of
1. That fixed the encoding problem, but I then belatedly realized the larger problem: that the scheme no longer allows values 0..127 to be encoding in a single byte, nearly doubling the storage size of plain ASCII text.
I'm going back to the IntX scheme I've used for the past 5 years, which works quite well. I didn't do it justice in my previous blog post, so I'll present it fully here. It allows values -64 to 127 to be encoded as a single byte, with larger values taking up to 5 bytes for an Int32. IntX only supports 32-bits total; 64 bit integers must be written as two IntX values to make use of the system.
The encoded bit patterns are as follows:
0xxxxxxx : 0..127 11xxxxxx : -64..-1 10nnxxxx : 4 bits x + n+1 (1..4) bytes follow, signed
In the third case, the 4
x bits are only used when when 1..3 bytes follow. If 4 bytes follow then those four
x bits are unused.
The nice thing about this encoding is that 192 of the 256 possible signed byte values are encoded as-is with no extra encoding or decoding work required!