Saturday, November 15, 2008

Effect of using a constant parameter for string types

Is not rare to find implementations of procedures/functions/methods that uses a value parameter for read only string arguments. While i always use constant parameters for such cases, the real benefit of this code practice was not clear. Until today.

I made a small application that implements two versions of a procedure identical except by the type of parameter (Value vs Constant)...

program asmConstParameter;

{$Mode ObjFpc}
{$H+}

uses
SysUtils, Types;

procedure DoIt(V: String);
begin
Writeln(V);
end;

procedure ByValue(V: String);
var
S: String;
begin

S := V;
DoIt(S);
end;

procedure ByReference(const V: String);
var
S: String;
begin
S := V;
DoIt(S);
end;

var
X: String;

begin
X := 'Test';
ByValue(X);
ByReference(X);
end.

...and examined the assembler output. See the difference yourself. Using optimizations through -O compiler options does not change the produced code.

So using a constant parameter has a practical effect, is not only a good code practice.

BTW: just for curiosity i put {$IMPLICITEXCEPTIONS OFF} in the program header. Not bad. Be aware that this info is here just for curiosity ;-) .

UPDATE: Using constant arguments also benefits ShortString types. See.

Sunday, October 26, 2008

Status of Virtual Treeview port

It has been more than one year after the last blog update about the Virtual Treeview (VTV) port and some people may be asking what happened with it.

The port is far from dead. In fact, it is fully working under Win32, Gtk1/2 and Qt since, at least, the previous six months. I was just waiting to the Lazarus 0.9.26 release to do an official release. Lazarus 0.9.26 is out, so where is the new VTV?

One of the main features of Lazarus 0.9.26 is the Unicode support for Win32 interface that, in your turn, uses UTF8 encoding. Currently VTV supports Unicode by using UTF-16/WideString and was working fine. After the LCL Unicode switch some encoding conversion problems appeared when iterating with strings returned by databases or LCL controls, so i decided to anticipate the migration to UTF-8 before doing a release.

This task is not trivial and i'll start to work on it only after December, so don't expect an release soon.

This has some advantages:
  • There will be no need to further string types changes (WideString -> String) in applications created with the released component
  • It will be faster since WideString is known to be slower specially under Win32
Anyway if someone is interested in testing the component check the instructions on how to use the svn version here. Be aware that a change from WideString to String type will be necessary in the long run.

Sunday, June 01, 2008

LCL, Gtk2, Pango and Cairo

Introduction

Lately, in the Lazarus mail list, has been a lot of discussion about the performance of LCL/TextOut under Gtk2 and many (erroneous) arguments used derives from the lack of proper knowledge, including from myself. So in this article, i'll try to clear the things a bit.

When Gtk2 was released back in 2002, Pango was one of the Gtk2 main new features. Pango provides support to render Unicode text (encoded in utf8) with advanced layouts. But i was also one of the reasons of the degraded performance of Gtk2 when compared with its predecessor.

Pango has a flexible design allowing to use different backends (Cairo, Xft, Win32, X) to render text. Until version 2.6 Gtk2 used the Xft backend that was replaced, starting from version 2.8, by cairo (a 2D vector drawing library) as the default renderer. At that time the performance dropped even more, but after some work in pango and in cairo, the things got better.

So, the first point to take in consideration when evaluating pango performance is the version of pango and cairo. More on this later.

Testing LCL

In order to get an accurate diagnostic, i wrote an test application that fills an entire window with text using TextOut and compared the results (output quality and time to draw) of the Linux widgetsets (Gtk1, Gtk2, Qt).

Here is the output:




The time to draw an entire screen (average times of 15 iterations):

gtk1: 24ms
gtk2: 150ms
qt: 92ms

The gtk1 widgetset is really fast but it has two drawbacks: the font quality is low and the screen flickers while updating.
The gtk2 widgetset is the slowest loosing to qt by 60%, but in other hand has the sharpest font draw. An important point is that when double buffer is disabled (LCL disables it by default), the screen flickers just like gtk1, but if double buffer is enabled there's no flicker and the screen is updated instantaneous.
The qt widgetset is in the midterm both in quality and in speed. Not sure why qt text looks blurred: if is a configuration or the default font is not so good. There's also no screen flicker since qt is all double buffered.

Gtk2 dissected (almost)

Pango allows more than one engine to be used and there's also the options to draw directly using cairo or the old gdk functions (that use the X11 bitmap fonts). So i wrote an application using direct calls to the gtk2 api. It draws text with default Gtk2 (Pango/Gdk), Pango/Xft, Xft, Cairo, Gdk/X11.

The output (Cairo is equal to Gtk2 and Gdk/X11 is equal to LCL/Gtk1):



The time results:

Gtk2 (Pango/Gdk): 130ms
Cairo: 90ms
Xft: 70ms
Gdk/X11: 12ms
Pango/Xft: 110ms

Gtk2: very close to LCL/Gtk2
Cairo: the same quality of Pango/Gdk (I would be very surprised if was different ;-)) but significantly faster
Xft: Almost two time faster then Pango/Gdk but with lower quality output (An configuration issue?)
Gdk/X11: basically the same output of gtk1. Again really faster but without the screen flicker of gtk1.
Pango/Xft: faster than Pango/Gdk, slower than Xft. Out of option since is not working at all: it draws always at 0,0 coordinates.

Alternatives to pango?

From this tests, using directly Cairo to draw text seems to be a good option to replace Pango: the same quality and faster. But is not that easy. With direct calls to Cairo is needed to do all sort of text position (bidi, alignment) and text styles (underline) manually. Also it would be necessary to do the font selection and loading logic manually in a system specific way,i.e., is necessary different code to Linux/Win32/MacOSX.

Another option is using Xft directly. Aside from the different/worse output, it would be necessary to change the widgetset to retrieve the XftDraw handle for each drawable which is a complex task principally for double buffered controls. All in a system specific way.

Gdk/X11: the quality of output and lack of Unicode support makes a no-no option. At least for me.

Conclusion

Is there a direct/easy/faster replacement for Pango under LCL/Gtk2? No.

Here is necessary to make another question: there's a real need to replace it? Like shown earlier, default Pango is really slow compared to other alternatives, but, principally when double buffer is enabled, there's no visible glitches or general system slowdown. Is also necessary to take in account the increase of code size and complexity in LCL/Gtk2 side to make such change.

Notes

  • The test applications can be found here and here. Is necessary the package chronolog;
  • The test were conducted in a Celeron 1.4, 512MB, with an intel integrated video running Ubuntu 8.04;
  • Upgrading from Ubuntu 7.10 to 8.04 leaded to an significant speed in all widgetsets (Gtk2 250 > 150 / Gtk1 50 > 24ms);
  • I also tested drawing with low level Pango api. I'll post the results later;
  • Win32 took impressive 5ms in the same test Not to be considered since the win32 test was done in another (more powerful) machine;
  • A point that is not directly related to text draw but affects performance of LCL/Gtk2 is the fact that the LCL/Gtk2 test application calls 4 times FcFontSort, while a Gtk2 pure application calls only once. This is an expensive call that takes almost 20% of the application time.

Saturday, March 15, 2008

Reduce memory usage of LCL

To test the conclusions of the last post i modified the field order of some LCL classes to group together Boolean fields.

Here's the return value of the InstanceSize property:

TButton
before: 874 bytes
after: 829 bytes (saved 24, 16 and 5 bytes in TControl, TWinControl and TCustomButton respectively)

TMenuItem
before: 152 bytes
after: 136 bytes

In an application with 100 controls and 20 menu items you save 2400 (considering only TControl memory) and 320 bytes respectively.

Maybe be this is negligible in computers with 1GB or more of RAM, but for mobile platforms it makes a difference.

The patch is here. Have fun!

Tuesday, February 19, 2008

Memory layout (and size) of a object

After reading a article about memory layout of objects in Delphi i was curious about how fpc behaves. So i did some small tests:

Memory layout of objects (instances of a class)

At offset 0 resides the virtual method table. Starting from ofsset 4 comes the fields. Just like in Delphi.

Number of associated methods

The number of associated methods and if they are virtual does not influence the object size. Just like in Delphi.

Type of the fields

According to the cited article, Delphi reserves 4 bytes for each field even if the type has a size of 1 byte. Here comes the fun.

Take the following classes:

TOneFlagClass = class
Flag1: Boolean;
end;

TTwoFlagClass = class
Flag1: Boolean;
Flag2: Boolean;
end;

The size of TOneFlagClass and TTwoFlagClass are 5 and 6 bytes respectively (4 for the vmt and 1 for each field). The memory offsets of Flag1 and Flag2 are 4 and 5.

Delphi is a bit different here. The size of both classes are 8. The memory offsets of the fields are the same as fpc.

At this time i think: "In this case is better to place less than 4 bytes fields at the end of the class declaration to avoid subsequent fields to be accessed outside the dword boundary"

I was wrong. In fact half wrong:

Take the following classes:
TFlagFirstClass = class
Flag1: Boolean;
Int1: Integer;
end;

TFlagLastClass = class
Int1: Integer;
Flag1: Boolean;
end;
The size of TFlagFirstClass and TFlagLastClass are 12 and 9 respectively. The compiler allocates 4 bytes for the boolean field to maintain subsequent fields (that has a size of 4 bytes) aligned with the dword boundary.

If another boolean field (Flag2) is added just after Flag1, the instance size is not affected. In fact, grouping 4 boolean (or another 1 byte type) fields together will lead to the same instance size as only one boolean field if those are succeeded by Integer or Pointer like types.

In the end, my suggestion is still valid: put the "less than 4 bytes field types" at the end of the field declaration of the class (or group together in groups with 4 bytes in total). You will save some memory.

If you are not convinced compare size of a class with the following fields sequence:
Boolean, Integer, Boolean, Integer, Boolean, Integer, Boolean, Integer
Boolean, Boolean, Boolean, Boolean, Integer, Integer, Integer, Integer
Integer, Integer, Integer, Integer, Boolean, Boolean, Boolean, Boolean

Some notes:
  • Object here is not referenced as the object type (that has the same memory layout of a record), but as the instance of a class
  • There's no difference between mode delphi and objfpc
  • It's valid only for i386 architeture. No idea how this works in ppc, amd64, arm

Wednesday, January 23, 2008

Effect of buffer size in deflate and md5

I tested the effect of buffer size in compressing a file using deflate procedure (paszlib unit) and calculating the md5 (using the functions of md5 unit).

I loaded a 30MB file in memory and did the compression/md5 calculation. The buffer size varied from 1024 to 512.000.

To my surprise no significantly difference was found, so no graph this time since is almost a plain line.

Friday, October 26, 2007

What Time Is It?

How many cairo clocks the world needs?

I don't think we have sufficient!