Monday, December 20, 2010

PowerPC Assembly Tutorial on AIX: Chapter 5, Saving non-volatile registers

We will now return to the problem we left at – why was the exit value 97 instead of 5? We now know that volatile registers don't retain their values across function calls, and when we called getchar(), the contents of r3 got overwritten. How do we overcome this? We can store the value of r3 in a non-volatile register before calling getchar. By convention, while using non-volatile registers, we use them in the order r31, r30 ... r13. Also, as we use each non-volatile register, we need to save them in the stack, and restore them when we return from the function. Hence the above program should be modified as follow:

  1. .set r0, 0
  2. .set r1, 1
  3. .set r3, 3
  4. .set r31, 31
  6. .extern            .getchar          # Tell the assembler that .getchar is an external symbol
  8. .csect
  9. .globl  .main
  10. .main:
  11.          # Function prolog begins
  12.          mflr     r0                     # Get the link register in r0
  13.          stw      r31, -4(r1)       # Save caller's r31
  14.          stw      r0, 8(r1)          # Save the link register
  15.          stwu    r1, -64(r1)       # Store the stack pointer, and update. Create a frame of 64 bytes.
  16.          # Function prolog ends
  18.          li          r3, 5
  19.          mr        r31, r3 # Copy contents of r3 into r31 for retention
  20.          bl         .getchar
  21.          ori        r0, r0, 0      # No-op, required by compiler/loader after a branch to an external function
  22.          mr        r3, r31 # Copy contents of r31 back to r3
  24.          # Function epilog begins
  25.          addi    r1, r1, 64         # Restore the stack pointer
  26.          lwz      r31, -4(r1)       # Restore caller's r31
  27.          lwz      r0, 8(r1)          # Read the saved link register
  28.          mtlr      r0
  29.          # Function epilog ends

  30.          blr
In line 13, we save the caller's (in this case __start's) r31 in the stack frame. In line 19, we copy the contents of r3 into r31, so that when getchar returns, although r3 would have been overwritten, we would still retain the value in r31. After getchar returns, we copy the contents of r31 to r3. We restore the caller's r31 in line 26 before returning from this function.

Tuesday, November 30, 2010

PowerPC Assembly Tutorial on AIX: Chapter 4, Most frequently used registers

We will digress a little and introduce the most frequently used registers, before we deal with the question of why we got a return value of 97 instead of 5 in the previous program.

In this section, we will not discuss all the registers of the PowerPC architectures, but only the most frequently used registers of the UISA (User Instruction Set Architecture) model. The UISA model defines the architecture to which user level programs should conform.

In the UISA model, we have the following registers:

  • 32 General Purpose Registers (GPRs)
  • 32 Floating Point Registers (FPRs)
  • Condition Register (CR)
  • Floating Point Status and Control Register (FPSCR)
  • Exception Register (XER)
  • Link Register (LR)
  • Counter Register (CTR)

There are 32 GPRs, named GPR0 – GPR31. These registers are 64-bit registers in 64-bit implementation, and 32-bit in 32-bit implementations. They can be used to manipulate integer data. GPR0 is used in function prologs. GPR1 is used as the stack pointer, and GPR2 is used as a pointer to the TOC, and these two registers should not be used for any other purpose.

GPR0-GPR12 are volatile registers, that is, their values are not preserved across function calls. GPR13-GPR31 are non-volatile registers. If a function wishes to overwrite these non-volatile registers, it must first save the value in the stack, and restore the value before returning.
GPR12 is also used in special handling in the glink code (We'll see what the glink code is later). In 64-bit architectures, GPR13 contains the thread pointer.

The link register is used to store the return address from a function call, and is generally automatically updated by the bl instruction. To return to the address contained in the LR, the blr instruction is used. The instruction 'mtlr' (move to link register) can be used to modify the link register to an arbitrary value.

The condition register (CR) is a 32-bit register divided into eight 4-bit fields, named CR0-CR7. The results of arithmetic and logical operations are stored in the condition register fields, and they can be used to perform conditional branches. CR0 is volatile, and CR1-CR7 are non-volatile. Hence, if any function attempts to change any of the condition registers CR1-CR7, it must save the state and restore it before returning to the caller.

The Counter Register (CTR) is used to perform branches, and used in looping to hold the loop count value. The value of the counter register may be modified by the 'mtctr' (move to counter register) instruction.

Friday, November 19, 2010

PowerPC Assembly Tutorial on AIX: Chapter 3, Calling Other Functions

Suppose you want your program to wait for the user to press a key before exiting, you would call the getchar() function which is exported by libc. Calling getchar from our program is rather straight forward, and all we have to do is to include the following line in our program:

        bl        .getchar

Note that in the above line, we have used .getchar instead of getchar. However, including this line alone in our program will not work, and in all probability, this program will just dump core. Do you know why?

We had seen that when we issue the bl instruction, the link register gets overwritten with the address of the instruction following the current one. Hence, after the bl instruction, the link register will contain the address of the following instruction (which is a part of .main). After returning from getchar(), when we issue the instruction blr from .main, we would not return to __start, as we would have over-written the link register set by __start when it issued the bl instruction.
How do we solve the problem? We create a stack frame for main, and save the link register in the frame.

Here is how the program would look like:

  1.     .set r0, 0
  2.     .set r1, 1
  3.     .set r3, 3

  4.     .extern     .getchar   # Tell the assembler that
  5.                                  # .getchar is an external symbol
  6.     .csect
  7.     .globl    .main
  8.     .main:
  9.         #### Function prolog begins ####
  10.         mflr    r0                 # Get the link register in r0
  11.         stw    r0, 8(r1)        # Save the link register
  12.         stwu    r1, -64(r1)    # Store the stack pointer, and
  13.                              # update. Create a frame of 64 bytes.
  14.         #### Function prolog ends ####

  15.         li    r3, 5
  16.         bl    .getchar
  17.         ori    r0, r0, 0          # No-op, required by loader after a 
  18.                                     # branch to an external function

  19.         #### Function epilog begins ####
  20.         addi    r1, r1, 64    # Restore the stack pointer
  21.         lwz    r0, 8(r1)      # Read the saved link register
  22.         mtlr    r0
  23.         #### Function epilog ends ####
  24.         blr

In the above program, line 5 tells the assembler that .getchar is an external symbol that is not present in the current file.

Load and store operations cannot be performed directly on the link register, and hence we have to copy the contents of the link register to another general purpose register before storing it. The mflr (move from link register) instruction takes as an argument another register, and copies the contents of the link register to the specified register.

In PowerPC, the convention is to use the general purpose register r1 as the stack pointer. In line 12, we save the value of the link register at an offset of 8 bytes from the stack pointer.

In line 13, we use the special instruction stwu (or store word and update) to advance the stack pointer and save the old stack pointer. In this line, stwu stores the value of r1, at the address r1-64, and then stores the value r1-64 in r1. Hence this single instruction allows us to do the two tasks of decrementing the stack pointer, and storing the old stack pointer at one go.

Having done this, we are ready to break into the main logic of the program. We use the bl instruction in line 18 to call getchar.

There are several special instructions, which the assembler treats specially. The instruction in line 19 is treated as a no-op. A no-op is required by loader after a call to an external function is made. We shall see why it is required later. xlc will not compile the program without the no-op. 'as' will not complain about it and compile the application.

Having done our job, we now have to restore the old values of the stack pointer (r1) and the link register. In line 23, we restore the stack pointer to its old value, by adding the immediate value 64 to it. We then load the stored link register value in r0 at line 24. We then use the mtlr (move to link register) to copy the contents of r0 to the link register. We then finish it by calling blr.

When we run this program, we see that it waits for us to enter something, and then returns.

So far, so good, but when happens when I check the exit value returned by this program?

$ ./a.out

We are no longer getting 5!!!

Just a note of caution, in this post, I have not followed the stack-linkage convention in its entirety, and have tried to simplify things and have only tried to capture the essence of stack-linkage. I will probably return to this topic in a later post.

PowerPC Assembly Tutorial on AIX: Chapter 2, Adding Clarity

You would have noticed that the load instruction in line 5 is rather confusing as to which is the register, and which is the value being read. To make things clearer, we will use the .set assembler pseudo-op. C programmers can think of .set as a #define. The program will then look like this:

  1.         #File. 1_2.s
  2.         .set r3, 3
  3.         .csect
  4.         .globl .main
  5.         .main:
  6.                 li        r3, 5
  7.                 blr

The above program has the same effect as 1_1.s

Thursday, November 18, 2010

PowerPC Assembly Tutorial on AIX: Chapter 1, The first steps

Its been quite some time since I last posted. In the next few posts, I'll try to present an introduction to PowerPC assembly on AIX.
The motivation for this comes from my personal experience trying to program in assembly on AIX. I found plenty of documentation on the instruction set, assembler directives etc. However, what I couldn't find was a step-by-step tutorial on how to write basic assembly programs. True, there were some developerworks articles, but the code presented in those articles hardly ever worked.
My endeavour is to present a primer into PowerPC assembly programming on AIX. Most of my programs will be sub-optimal, and simplistic. The goal is not to write perfect programs - rather to get someone started on PowerPC assembly programming on AIX, so that he can go on from here and take advantage of the large amount of material available on the web on PowerPC programming.

The first program, usually written in any programming language, is the hello world program. However, writing a hello world program, in assembly is certainly not the easiest first. We will start with a much simpler program. A program that does nothing, or, almost nothing. The program just exits with an exit value.

The default extension of an assembly program is .s.

  1.         #File. 1_1.s
  2.         .csect
  3.         .globl .main
  4.         .main:
  5.                 li 3, 5
  6.                 blr
In this tutorial, we will use the xlc compiler to compile our first assembly program. $ cc 1_1.s And now, onto running this program: $ ./a.out $ echo $? 5 The first line in this program is a comment. Comments start with a #. A Comment can be placed anywhere in a line. Any text after the # in a line is ignored by the assembler. The second line in the program tells the assembler that this is a csect, or a relocatable module. We will learn more about csects in section .
The third line tells the assembler that .main is a global symbol, and other objects can link to it.

Line 4 is a label named .main. The assembler recognizes that this is a label by the colon following the label name. Line 3 and line 4 work together to signify that .main is a global symbol, and its address is specified by the label ine line 4.

In PowerPC, the convention for a function to return a value is to store it in register 3. Line 5 loads the value 5 into the register 3. 'li' is a load instruction, and loads an immediate value into a register.

In AIX, whenever a binary is run, the function __start is automatically executed. __start then calls the symbol .main.

In PowerPC, whenever one function calls another, it does so by executing the instruction bl, or branch and link. bl stores the address of the next instruction to be executed in the link register, and then branches to the specified address.
Therefore, when the callee function returns, it should start executing the instruction whose address is specified in the link register. While returning, the callee simply executes blr (branch to link register) instruction, which automatically loads the contents of the contents of the link register into the program counter and starts executing it.

More posts on PowerPC assembly to come in the following weeks.

Monday, May 11, 2009

xlc Compiler options to be used with debugger

While doing source code debugging, one generally compiles with -g option, and assumes that all compiler optimizations have been turned off. However, as far as the xlc compiler is concerned, this might not necessarily be true. With -g the compiler puts in line-number information and turns off some optimizations, but not all optimizations. To tell the compiler to turn off ALL optimizations, the -qnoopt option should be employed.

Saturday, July 26, 2008

Heap usage

I haven't posted in a while, due to lack of time and too much work, and I think this post has been rather overdue.
One of the things aspects of AIX's malloc (or for that matter any other operating system) is that if you free the memory you have allocated, it won't reflect in the svmon output. This is because malloc subsystem caches the memory, to be used for further malloc.
An easy way to see how much memory your application is using (the memory malloced by it, + the memory in the free pool maintained by the malloc subsystem) is to use the variable process_brk which is exported by libc.
The way I usually go about it is to use the dbx subcommand
(dbx) p &process_brk
This gives me the address to dump, which I dump using a command similar to the one below
(dbx) 0x12345678/3X
This will give an output of three words..
12345678 12345678 1
The first word signifies what was the brk value before the first malloc was done, and the second word tells you what was the brk value after the second allocation was done. The third word tells how many sbrk()s were done. Of course, this gives me a very good estimate of the total memory used by my program.
Another benifit of this thing is to check for heap/stack collision. To check whether there has been a heap stack collision in my 32-bit app, what I normally do is to dump the stack-pointer, and check whether the stack_pointer falls within the process_brk minimum and maximum limits.
Hope this helps.