Sunday, June 22, 2008

Identifying memory leaks on AIX

Memory leaks are often a pain, especially if the program runs for a long time. The pain is increased by the difficulty of pin-pointing the exact place where the memory leak occurs. There are many commercial tools available which do an excellent job of pin-pointing memory leaks, but most of them come at a cost. If you use AIX, though, you have a pretty much plain jane, but very flexible and usable option of the malloc debug tool called report_allocations.
With every call to malloc, a log is added which stores the stack where it was allocated from (yes, it does some dbx like stack walk thingy), and with every call to free() the corresponding log is deleted. An atexit handler is registered which prints all the logged entries. Guess this clarifies how report_allocations work.
To enable report_allocations one needs export the following environment variables

export MALLOCTYPE=debug
export MALLOCDEBUG="report_allocations,stack_depth:4"

stack_depth indicates how much of the stack information is to be stored, the default being 4 and the highest being 32.
Malloc debug does not do any aggregation of allocations of a similar kind (that is allocations of the same size from the same stack trace), but prints all allocations one by one. This is sometimes a drawback, however, one can easily write a script around malloc-debug to aggregate the data. One such script is readily available in a developerworks article. The author of the script, however, warns that the script may break in some corner conditions. However, I guess we can fix those issues. Over to developerworks for more info:

Monday, June 16, 2008

The beginnings: Speeding up C++ STL Vector applications

Having started this blog quite some time ago, I didn't have any idea what I was going to post in it. I only knew that it would contain tips and tricks for AIX which one might find handy, some of which one might find elsewhere, some of which one may not find elsewhere. I still am not sure as to what sort of tips and tricks I'm going to put, but I'll put them, thats for sure. Don't ask me about the frequency.

This first post is for guys who use STL Vectors heavily in their C++ code. If you guys want an improvement in performance, you might want to use the 3.1 malloc allocation policy. All you have to do is to export one of the following environment variable before running your code:

export MALLOCTYPE=3.1 #for 32-bit applications
export MALLOCTYPE=3.1_64BIT #for 64-bit applications

How does it help? Well, it appears, that a vector keeps growing itself as and when it needs. Hence a lot of calls to realloc(). What happens with the 3.1 allocation policy is that requests are rounded up to the next higher power of 2. That is if you request 1026 bytes, you get 2048 bytes. Hence when a realloc is done, it often returns without doing any work since the memory is already there.
3.1 is also faster than the default allocation policy as the default allocation policy uses a tree based data structure, essentially forcing the average time of O(log(n)), 3.1 is a bucket-based allocator with time complexit O(1).
The caveat to using 3.1 is that it is wasteful of memory, so its upto you to decide what you want to compromise on. Memory usage or speed.

How we found this out? Well, we were running some benchmark tests, and on exporting MALLOCTYPE=3.1 we found the performance to shoot up by 40%. Of course, the benchmark we were running was memory bound.