PDA

View Full Version : Compressing array dump


Slice
12/11/2010, 02:36 PM
Well I'm not sure if the titel describes exactly what I mean, but I'm tired and tired so it'll have to do.
Basically, I'm trying to save arrays into a hex-string (that then goes into a database).

Say an array is like this:3
2
56
7
8
5
It would, without any sort of compression, be stored as:0000000200000039000000070000000800000005
That's just such a waste of space!

I saw this in the PAWN Implementer's Guide (http://www.compuphase.com/pawn/Pawn_Implementer_Guide.pdf):
The highest bit of each byte is a “continuation” bit. If it is set, another bytes
with seven more significant bits follows. The most significant 7 bits are stored
first (at the lower file offset/memory address). When a series of bytes have been
decoded, bit 6 (the next to most signification bit) of the first byte is repeated to
fill the complete 32-bits.
That seems like a good idea; however, after a quick look, I'm not sure how I would write code for doing that. I will look into it a bit further, though.

Ideas, anyone?

Slice
14/11/2010, 02:04 PM
Thanks, now I got something to test around with.
However, it seems to just hang. I'll do some more testing later, though.

Btw I'm guessing there should be code for this in the PAWN source (maybe the stuff in "compiler/scpack.c").

Slice
26/01/2012, 07:09 AM
Super-duper bump! I managed to solve this. Here's an example:

main () {
new
array[123] = {1, 2, 3, ...},
compressed_array[100],
compressed_size,
decompressed_array[sizeof(array)]
;

compressed_size = CompressArray(array, sizeof(array), compressed_array);
DecompressArray(compressed_array, decompressed_array);

printf("Array size: %d bytes; compressed size: %d bytes.", sizeof(array) * 4, compressed_size);

for (new i = 0; i < sizeof(array); i++) {
if (decompressed_array[i] != array[i])
printf("ERROR! Index %d not matching (%d != %d).", i, array[i], decompressed_array[i]);
}
}


Note that the compressed array is a packed string!

Here are the functions:
stock CompressArray(const aiArray[], iSize = sizeof(aiArray), aiOutput[]) {
new
iOutputIndex = 4,
iValue,
iMSB,
iShift
;

// * 0b11000000 = Single byte, negative
// * 0b10000000 = Single byte
// * 0b01000000 = Multi-byte
// - 0b01000000 = More bytes
// - 0b11000000 = Last byte
// - 0b10000000 = Unused

for (new i = 0; i < iSize; i++) {
// Will the value fit in one byte?

iValue = aiArray[i];

if (-0b00111111 <= iValue <= 0b00111111) {
// Is the value negative?

if (iValue & 0x80000000) {
// Set the "single byte, negative" bits on and put the value without its sign

aiOutput{iOutputIndex++} = 0b11000000 | -iValue;
} else {
// Just put the value in with the "single byte" bit

aiOutput{iOutputIndex++} = 0b10000000 | iValue;
}
} else {
// Figure out how many bits we'll have to write
iMSB = FindMSB(iValue) + 1;

// Make iShift a multiple of 6 (if it isn't already)
if ((iShift = iMSB % 6))
aiOutput{iOutputIndex++} = 0b01000000 | (iValue >>> (iMSB - iShift) & ~(0xFFFFFFFF << iShift));

iShift = iMSB - iShift;

// Write bits out left-right
while ((iShift -= 6) >= 0)
aiOutput{iOutputIndex++} = 0b01000000 | (iValue >>> iShift & 0b00111111);

// Change the "more bytes" bits into "last byte"
aiOutput{iOutputIndex - 1} |= 0b11000000;
}
}

// Put the number of bytes we just wrote into the first cell of the output
aiOutput[0] = 0x80808080 | ((iOutputIndex & 0x1FE00000) << 3) | ((iOutputIndex & 0x3FC000) << 2) | ((iOutputIndex & 0x7F80) << 1) | (iOutputIndex & 0x7F);

// Make sure the bytes in the last cell are 0
aiOutput{iOutputIndex} = 0;

iValue = iOutputIndex;

while (++iOutputIndex % 4)
aiOutput{iOutputIndex} = 0;

// Return the number of bytes written (not counting the first 4)
return iValue;
}

stock DecompressArray(const aiCompressedArray[], aiOutput[], iOutputSize = sizeof(aiOutput)) {
new
iBytes,
iOutputIndex = 0
;

// Get the number of bytes to parse
iBytes = aiCompressedArray[0];
iBytes = ((iBytes & 0x7F000000) >>> 3) | ((iBytes & 0x7F0000) >>> 2) | ((iBytes & 0x7F00) >>> 1) | (iBytes & 0x7F);

for (new i = 4; i < iBytes; i++) {
// Single byte?
if ((aiCompressedArray{i} & 0b10000000)) {
// Negative?
if ((aiCompressedArray{i} & 0b01000000))
aiOutput[iOutputIndex++] = -(aiCompressedArray{i} & 0b00111111);
else
aiOutput[iOutputIndex++] = (aiCompressedArray{i} & 0b00111111);
} else {
// Multi byte; read the last bits
aiOutput[iOutputIndex] = aiCompressedArray{i} & 0b00111111;

// Keep reading bits while shifting the value to the left
do {
aiOutput[iOutputIndex] <<= 6;
aiOutput[iOutputIndex] |= aiCompressedArray{++i} & 0b00111111;
} while ((aiCompressedArray{i} & 0b10000000) == 0);

iOutputIndex++;
}

// Out of slots?
if (iOutputIndex >= iOutputSize)
break;
}

return iOutputIndex;
}

stock FindMSB(iInput) {
// http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn

static const
aiDeBruijnBitPositionsPacked[32 char] = {
0x0A010900,
0x1D02150D,
0x12100E0B,
0x1E031916,
0x1C140C08,
0x0718110F,
0x06171B13,
0x1F04051A
}
;

if (iInput) {
#emit LOAD.S.pri iInput
#emit MOVE.alt
#emit SHR.C.alt 1
#emit OR
#emit MOVE.alt
#emit SHR.C.alt 2
#emit OR
#emit MOVE.alt
#emit SHR.C.alt 4
#emit OR
#emit MOVE.alt
#emit SHR.C.alt 8
#emit OR
#emit MOVE.alt
#emit SHR.C.alt 16
#emit OR
#emit CONST.alt 0x07C4ACDD
#emit UMUL
#emit SHR.C.pri 27
#emit ADD.C aiDeBruijnBitPositionsPacked
#emit LODB.I 1
#emit RETN
}

return -1;
}