Even Better IntX

This morning I realized the flaw in my IntX scheme that I presented in my last post.

First I realized that my algorithm wasn't working for non-negative integers that were a multiple of 7 significant bits and that my test cases were sloppy that way. For example, 64 (01000000) was encoding as that same byte value. When decoded, the most significant data bit 1 was interpreted as part of a signed number and yielding a result of -64.

I made a quick fix in the sample code I listed so that non-negative numbers wouldn't have an initial MSB of 1. That fixed the encoding problem, but I then belatedly realized the larger problem: that the scheme no longer allows values 0..127 to be encoding in a single byte, nearly doubling the storage size of plain ASCII text.

I'm going back to the IntX scheme I've used for the past 5 years, which works quite well. I didn't do it justice in my previous blog post, so I'll present it fully here. It allows values -64 to 127 to be encoded as a single byte, with larger values taking up to 5 bytes for an Int32. IntX only supports 32-bits total; 64 bit integers must be written as two IntX values to make use of the system.

The encoded bit patterns are as follows:

0xxxxxxx :  0..127
11xxxxxx : -64..-1
10nnxxxx : 4 bits x + n+1 (1..4) bytes follow, signed

In the third case, the 4 x bits are only used when when 1..3 bytes follow. If 4 bytes follow then those four x bits are unused.

The nice thing about this encoding is that 192 of the 256 possible signed byte values are encoded as-is with no extra encoding or decoding work required!