Sunday, September 23, 2007

Reduce zlibar memory usage - step1

Previously we learned that zlibar uses 1.5MB of memory to compress a 1.1MB file. It compress each file in three passes: 1) loads all the file into memory 2) feed the deflate function with this data using a small buffer as a bridge 3) calculate the md5 signature transversing the file data again.

The option is to compress using only one pass: load the file data incrementally into a memory buffer and than feed deflate and md5 functions with it. It has two advantages: the memory usage (in this step) is constrained by the size of the buffer and you save the memory copy from the stream (that holds the file data) to the deflate buffer and to md5 buffer.

The modified InternalCompressStream function would be something like:

MD5Init(Context);
z.next_in := @input_buffer;
z.avail_in := FileRead(InStream, input_buffer, MAX_IN_BUF_SIZE);
MD5Update(Context, input_buffer, z.avail_in);
while z.avail_in > 0 do
begin
repeat
z.next_out := @output_buffer;
z.avail_out := MAX_OUT_BUF_SIZE;
err := deflate(z, Z_NO_FLUSH);
OutStream.Write(output_buffer, MAX_OUT_BUF_SIZE - z.avail_out);
until Z.avail_out > 0;
z.next_in := @input_buffer;
z.avail_in := FileRead(InStream, input_buffer, MAX_IN_BUF_SIZE);
MD5Update(Context, input_buffer, z.avail_in);
end;
MD5Final(Context, Result.Md5Sum);

Lets run valgrind/massif to see what we got:

Comparing with the previous graph we notice a great memory usage reduction: from 1.5MB to 0.5MB.

The heaptrc dump:
55 memory blocks allocated : 2811961/2812056
55 memory blocks freed : 2811961/2812056
0 unfreed memory blocks : 0
True heap size : 950272
True free heap : 950272

Let's do a deeper analysis:
  • The memory used by deflate functions is close to the expected 256kb.
  • The pink area is the memory used by the stream that holds the compressed data. It is allocated incrementally so the ascending angle.
  • There's a peak after the deflate memory is freed and just before the program finishes. This represents the copy from the compressed stream to the output stream that doubles the data in memory. More on this later.
Someone may say that load a file using a small buffer is slower than reading all data directly in memory. This is true but it would be reasonable only with small files. In a general usage packer it would be a big limitation.

No comments: