Sunday, September 23, 2007

Zlibar memory behavior compressing many small files

The previous analysis of zlibar memory behavior was done taking as example the compression of one big file. Let's with many small files (all *.pas files under lazarus/lcl dir).

Original (load the entire file in memory):
The heaptrc dump:
18573 memory blocks allocated : 496119560/496173048
18573 memory blocks freed : 496119560/496173048
0 unfreed memory blocks : 0
True heap size : 4423680
True free heap : 4423680


After memory optimization (load file in small buffer):
The heaptrc dump:
17446 memory blocks allocated : 439659064/439708048
17446 memory blocks freed : 439659064/439708048
0 unfreed memory blocks : 0
True heap size : 2326528
True free heap : 2326528



We can take some conclusions:
  • The memory usage is almost equal over time (the graph scale does not help much here)
    UPDATE: the heaptrc dump shows that the original code really takes more memory.
  • In the optimized build the memory is allocated in a continuous fashion, always growing. The original build the memory is allocated and freed all over time while still growing in the end. This can lead to more memory fragmentation.
  • In the optimized build there's not the final peak. This is not really expected since the section of code responsible by the peak was not changed. Some options: 1) the peak exists but valgrind does not detect 2) a bug in the optimized code 3) an unexpected (and good) side effect

4 comments:

Anonymous said...

Hi

Interesting posts lately about valgrind and massif, thanks.

As for CMEM_xxx function names, I managed to avoid them with --alloc-fn. Two posts earlier you wrote that it didn't work for you, but it did work quite OK for me. The important hint may be to remember to quote symbols with $ in their names as appropriate, otherwise bash will mess the $ signs. Single apostrophes are safest.

I write a simple script to run my programs through massif:

#!/bin/sh
set -eu

valgrind --tool=massif \
--alloc-fn='CMEM_CGETMEM$LONGINT$$POINTER' \
--alloc-fn='CMEM_CREALLOCMEM$POINTER$LONGINT$$POINTER' \
--alloc-fn='SYSTEM_GETMEM$LONGINT$$POINTER' \
--alloc-fn='SYSTEM_GETMEM$POINTER$LONGINT' \
--alloc-fn='SYSTEM_REALLOCMEM$POINTER$LONGINT$$POINTER' \
--format=html \
"$@"

As you see, I use alloc-fn for CMem functions, and also for System.GetMem/ReallocMem and initialization of TObject. I found it's most useful this way. I also have on this list a couple of symbols from my own code, and it looks like it may be useful to also add there SYSTEM_TOBJECT_$__NEWINSTANCE$$TOBJECT

Note that even without using --alloc-fn, you can always take a look at generated by massif txt (or html, if you used --format=html) files, and look at depths larger than 0. Even if depth = 0 contains only CMem wrappers, larger depths will lead to actual functions from your programs.

Hope this helps :)

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...

Uhh, sorry for double post, I messed the "publish" with "preview" buttons. Please delete the second post.

And, the intention was to write that SYSTEM_TOBJECT_$__NEWINSTANCE$$TOBJECT is for initialization of TObject...

Luiz Américo said...

Thanks for the hints. I was using only the pascal names (CMEM_CGETMEM) without the parameter list.

I will recreate the graphs based in your info. I'm sure it will give more information on what's going on