Tuesday, September 18, 2007

Using Valgrind to profile fpc applications

Before starting to optimize is necessary to know beforehand what and where to optimize. It's here that the profiler tools plays a role. Valgrind, and its brother KCachegrind, are know unix profiler tools that makes success in the C crowd. Let's see if is useful to fpc programmers.

Zlibar is a fpc component that encapsulates the paszlib functions in a programmer friendly way. I use it in the Becape application and for sometime i have a plan to optimize it.

The heart of the component is the TZlibWriteArchive.CreateArchive method that compress the files (InputFiles) into a stream (OutputStream):

TmpStream := TMemoryStream.Create;
TmpFile := TMemoryStream.Create;
[..]
for X := 0 to fInputFiles.Count-1 do begin
[..]
TmpFile.LoadFromFile(fInputFiles.FileName[X]);
[..]
FileInfo.CompressedSize := InternalCompressStream(X, TmpFile, TmpStream); //(1)
FileInfo.Md5Sum := StreamMD5(TmpFile);
[..]
end;
WriteHeader(AHeader);
OutStream.CopyFrom(TmpStream, TmpStream.Size);//(2)
[..]


It creates two temporary memory streams (TmpFile and TmpStream). TmpFile will hold the uncompressed data of the file being added. TmpStream will be filled with the compressed data in step (1). The process continues until all files are compressed in the TmpStream.
After that the header is written in the OutputStream and then the compressed data is written in the OutputStream.

The problems:
  1. The file is stored entirely in memory before is processed. For small files is fine but for larger files it would be problems.
  2. Even for small files the memory of TmpFile will be reallocated in most of the LoadFromFile calls
  3. After step (2), you will have three streams in the heap: an uncompressed file, the compressed data of all files and a header + the compressed data of all files
To see this in action i created a small application that just compress only file (the VirtualTrees.pas with 1.1MB), and compiled with -gv (Generate code for Valgrind) and -gl. Run with valgrind(callgrind):

valgrind --tool=callgrind ./zlibar_opt
It was created a file with the pattern callgrind.out.[pid] that i loaded in KCacheGrind. In this tool is possible to see most of the function calls the application did, the times that each function were called, who called who, and the time each function spent.
To my surprise the memory allocation routines does not spent much time (in fact was zero). The most expensive was, of course, the compression related functions.





Now let's use the massif tool:

valgrind --tool=massif ./zlibar_opt
It creates two files: massif.[pid].ps and massif.[pid].txt. The txt file contains info about the callstack and how much memory each function allocated. The ps file contains a graphic showing the functions that were responsible for most memory allocation and the evolution in time. See below:


Now you say "what hell is this"?
To work with valgrind the -gv option forces the use of the cmem memory manager which is a wrapper around malloc, so massif understand the cmem* functions as the programmers allocation routines. I tried to use the fn-alloc option to force the display of the pascal functions without success.

Update: i was passing only the pascal name function (CMEM_GETMEM) to fn-alloc while is necessary also the parameter list names. I updated the charts to show where in pascal the memory is allocated.

Anyway in the graphic we can see that 1.5MB of heap memory is allocated (used?*) at its peak and that is the point where we can optimize.

Here's the heaptrc dump:
58 memory blocks allocated : 3926105/3926208
58 memory blocks freed : 3926105/3926208
0 unfreed memory blocks : 0
True heap size : 950272
True free heap : 950272

In the next articles, i will take a look in the possible optimizations.

Notes:
  • * The fpc heap manager pre allocates space in the heap that sometimes is not all used but i dont know if this is still valid when using the cmem functions
  • The -gv option is necessary to run the massif tool but is dispensable when using the callgrind
  • The valgrind checkmem tool is of little utility to fpc since it provides the heaptrc unit with a lot of advantages
  • One drawback of valgrind is that is exclusive to unix. No windows.
  • For more info see the valgrind manual

1 comment:

DarkElf said...

awesome